Before I start, hi there Stan community! I’m both new to Bayesian statistics and to Stan, so feel free to comment on anything you think is worth commenting on :)
The goal
I like to model sea surface temperatures (SST) in a region where the seasonal cycle is cutoff during winter months (change of physical processes during sea-ice cover and subsequently change of SST distribution).
I set up the data y^{D\times N} to have two dimensions: D = the number of days that I have per year; and N = the number of years.
For the model, I like the mean and trend to be very smooth along D. So I employ GPs (kernels K_1 and K_2). And I also like to have the noise to be AR1.
- length of
dayofyear
dimensionD
approx. 290 - length of
year
dimensionN
approx. 20
So I think the model I want should look (something) like this:
I have a prior \mu_0 for each dayofyear at t_0 (and technically also a variance, but I decided to ignore that until it works at least a little).
The (wrong) model
The code below is a stage of my code which I was able to run (although it took forever and terminated before it finished with a file-not-found error related to some (maybe) temporary file). Anyway, it is not really what I aim for (see block thereafter, for what I actually try to do).
So here the code that I could write and run:
data {
int<lower=1> D;
int<lower=1> N;
array[D] int<lower=1> doy;
array[D] real<lower=0> sig_0;
array[N] int year;
array[D, N] real y;
vector[D] mu_0;
}
transformed data {
matrix[D, D] K1 = gp_exp_quad_cov(doy, 0.2, 1.0 * D);
matrix[D, D] K2 = gp_exp_quad_cov(doy, 0.05, 3.0 * D);
for (d in 1:D) {
K1[d, d] = K1[d, d] + 0.01;
K2[d, d] = K2[d, d] + 0.01;
}
}
parameters {
real<lower=0> rho;
real<lower=0> sigma_annual;
vector[D] a;
vector[D] b;
vector[D] epsilon;
}
transformed parameters {
matrix[D, D] AR1;
for (i in 1:D) {
for (j in 1:D) {
AR1[i, j] = rho^(abs(i-j));
}
}
matrix[D, D] sigma_AR1;
sigma_AR1 = diag_matrix(10.0 * to_vector(sig_0)) * AR1 * diag_matrix(10.0 * to_vector(sig_0));
}
model {
rho ~ beta(1, 1);
a ~ multi_normal(mu_0, K1);
b ~ multi_normal(rep_vector(0, D), K2);
sigma_annual ~ scaled_inv_chi_square(1, 1);
for (n in 1:N) {
epsilon ~ multi_normal(rep_vector(0, D), sigma_AR1);
for (d in 1:D) {
y[d, n] ~ normal(a[d] + b[d] * year[n] + epsilon[d], sigma_annual);
}
}
}
The (invalid) model
Now the above doesn’t have actual slice-wise autocorrelation (for each year=N), so what I think what I actually want is more like this:
...
model {
rho ~ beta(1, 1);
a ~ multi_normal(mu_0, K1);
b ~ multi_normal(rep_vector(0, D), K2);
for (n in 1:N) {
epsilon[n] ~ multi_normal(rep_vector(0, D), sigma_AR1)
y[:, n] = a + b * year[n] + epsilon[n];
}
}
but that is not Stan syntax. I’ve seen how this could work if I hadn’t this extra dimension (or at least I am personally unable to translate the example to my problem). Anyway, I have trouble finding answers (or the right search prompt) to solve the problems in the code.
The (vague) question
I know this is quite vague, but anyway: What can I do to get this running?
Beyond the wrong model, there are probably many problems in my code, feel free to comment on them too :)
Thanks!
Ezra