Modelling a timeseries as a 2D field including AR1 noise and smooth seasonal cycle and trend (new to bayes and stan)

ezra_eisbrenner · August 1, 2023, 11:09am

Before I start, hi there Stan community! I’m both new to Bayesian statistics and to Stan, so feel free to comment on anything you think is worth commenting on :)

The goal

I like to model sea surface temperatures (SST) in a region where the seasonal cycle is cutoff during winter months (change of physical processes during sea-ice cover and subsequently change of SST distribution).

I set up the data y^{D\times N} to have two dimensions: D = the number of days that I have per year; and N = the number of years.

For the model, I like the mean and trend to be very smooth along D. So I employ GPs (kernels K_1 and K_2). And I also like to have the noise to be AR1.

length of dayofyear dimension D approx. 290
length of year dimension N approx. 20

So I think the model I want should look (something) like this:

a(d) \sim \mathcal{N}(\mu_0(d), \vec{K}_1)

b(d) \sim \mathcal{N}(0, \vec{K}_2)

\varepsilon(d,n) \sim \mathcal{N}(0, \vec{AR1}_\text{cov})

y(d,n) \sim a(d) + b(d) \times \text{year}(n) + \varepsilon(d,n)

I have a prior \mu_0 for each dayofyear at t_0 (and technically also a variance, but I decided to ignore that until it works at least a little).

The (wrong) model

The code below is a stage of my code which I was able to run (although it took forever and terminated before it finished with a file-not-found error related to some (maybe) temporary file). Anyway, it is not really what I aim for (see block thereafter, for what I actually try to do).

So here the code that I could write and run:

data {
  int<lower=1> D;
  int<lower=1> N;
  array[D] int<lower=1> doy;
  array[D] real<lower=0> sig_0;
  array[N] int year;
  array[D, N] real y;
  vector[D] mu_0;
}
transformed data {
  matrix[D, D] K1 = gp_exp_quad_cov(doy, 0.2, 1.0 * D);
  matrix[D, D] K2 = gp_exp_quad_cov(doy, 0.05, 3.0 * D);
  for (d in 1:D) {
    K1[d, d] = K1[d, d] + 0.01;
    K2[d, d] = K2[d, d] + 0.01;
  }
}
parameters {
  real<lower=0> rho;
  real<lower=0> sigma_annual;
  vector[D] a;
  vector[D] b;
  vector[D] epsilon;
}
transformed parameters {
  matrix[D, D] AR1;
  for (i in 1:D) {
    for (j in 1:D) {
      AR1[i, j] = rho^(abs(i-j));
    }
  }
  matrix[D, D] sigma_AR1;
  sigma_AR1 = diag_matrix(10.0 * to_vector(sig_0)) * AR1 * diag_matrix(10.0 * to_vector(sig_0));
}
model {
  rho ~ beta(1, 1);
  a ~ multi_normal(mu_0, K1);
  b ~ multi_normal(rep_vector(0, D), K2);
  sigma_annual ~ scaled_inv_chi_square(1, 1);

  for (n in 1:N) {
    epsilon ~ multi_normal(rep_vector(0, D), sigma_AR1);
    for (d in 1:D) {
      y[d, n] ~ normal(a[d] + b[d] * year[n] + epsilon[d], sigma_annual);
    }
  }
}

The (invalid) model

Now the above doesn’t have actual slice-wise autocorrelation (for each year=N), so what I think what I actually want is more like this:

...
model {
  rho ~ beta(1, 1);
  a ~ multi_normal(mu_0, K1);
  b ~ multi_normal(rep_vector(0, D), K2);

  for (n in 1:N) {
      epsilon[n] ~ multi_normal(rep_vector(0, D), sigma_AR1)
      y[:, n] = a + b * year[n] + epsilon[n];
  }
}

but that is not Stan syntax. I’ve seen how this could work if I hadn’t this extra dimension (or at least I am personally unable to translate the example to my problem). Anyway, I have trouble finding answers (or the right search prompt) to solve the problems in the code.

The (vague) question

I know this is quite vague, but anyway: What can I do to get this running?

Beyond the wrong model, there are probably many problems in my code, feel free to comment on them too :)

Thanks!
Ezra

Bob_Carpenter · August 1, 2023, 9:03pm

Hi, @ezra_eisbrenner and welcome to the Stan forums.

Let me start by saying this is a very challenging place to start with Bayes and Stan. Even for seasoned Bayesian Stan devs, we recommend starting with simpler models that work then add complexity one component at a time. It’s usually much easier to improve a working model than to debug a model that doesn’t work.

What’s with all the superscript D everywhere? From the Stan code, it looks like a^D is just a D-dimensional vector, so I’m guessing the superscript is part of the name of variable a. Personally, I find superscripts as part of the name of a variable very confusing, but @andrewgelman does it everywhere in his books, so there’s no way I can stem the tide of people copying his coding practices. And why is \epsilon subscripted with n when it looks like it’s a D-vector?

I’d urge you not to name your variables things like AR1. Also, it’s much and much more efficient to code an AR(1) process directly in Stan rather than working through a covariance matrix. Every time you have a multivariate normal, Stan needs to solve that matrix (residual inverse products plus determinants) and calculate derivatives. For AR(1) variables you can just do this:

functions {
  real ar1_lpdf(vector y, real beta1, real beta2, real sigma) {
    int N = rows(y);
    return normal_lpdf(y[2:N] | beta1 + beta2 * y[1:N-1], sigma);
  }

and then you can just use

epsilon ~ ar1(beta1, beta2, sigma);

where beta1, beta2 are the AR coefficients and sigma is the scale.

If you have rho ~ beta(1, 1), that implies rho is being given a uniform distribution over (0, 1). That means you need to declare rho with constraint <lower=0, upper=1> or you’ll get all kinds of errors and warnings for values out of bounds.

The for loop over N is wrong in that you’re applying the prior for epsilon a total of N times (equivalent to raising it to the Nth power). Instead, you want to pull that epsilon ~ ar1(…)` out of the loop as it should only get the prior once.

The final regression loop can be made much more efficient with vectorization, e.g.,

for (n in 1:N) {
  y[d] ~ normal(a[d] + b[d] * year + epsilon[d], sigma_annual);

But this looks rather non-identified with a[d] and epsilon[d] as additive terms (you can add a constant to a and subtract from epsilon without changing the likelihood.

What’s the motivation for a scaled inverse chi-square prior on the scale sigma_annual?

It’s faster to define rep_vector(0, D) as transformed data and re-use rather than reallocating and copying in for every use.

I’m not sure why you’re saying the line isn’t Stan syntax. I think the problem is more declaration than types. For example, you are declaring epsilon as a D-vector, but seem to be using it as if it were a matrix whose rows were D-vectors.

ezra_eisbrenner · August 2, 2023, 9:22am

Hi @Bob_Carpenter, and first, thank you!

Well, I tried to make it better readable, and apparently I didn’t :) (1) you’re right, with the D’s I just wanted to show the dimension; (2) the subscript n then was just fully confusing of me to use, since that changed the notation halfway down. I updated the formulas, I hope this is better! I am new to the use of the \sim sign and thus never quite sure if I use it correctly when there is more than one dimension.

I also had a typo in the equation for y(d,n), I wrote b(d) \times y_n instead of b(d) \times \text{year}(n) (it’s corrected now).

Noted, and going to update that while rewriting the code!

Below, I go a bit into the reasoning for what I did. For now, I need a bit more time to understand the function and how I’d go about implementing it. I will do that today and come back later to ask more questions :)

Yes, I got all those warnings, I didn’t knew that I could have an upper bound!

I agree, and a simpler example would have been better for me to handle (I tried first without an ar1, see code all the way at the bottom). But at the time, when I started, this example came up in my data (partially ice covered ocean regions), so I ended up trying it on the data I knew. And I did indeed got it running up to some point (see image below). Although it never did what I thought it should (I tried to heavily smooth a (mean) and b (trend) along D since there is no physical reason for those to be jagged).

The last version where I got results had epsilon outside the loop:

model {
  rho ~ beta(1, 1);
  a ~ multi_normal(mu_0, K1);
  b ~ multi_normal(rep_vector(0, D), K2);
  epsilon ~ multi_normal(rep_vector(0, D), sigma_AR1);
  sigma_annual ~ scaled_inv_chi_square(10, 1);

  for (d in 1:D) {
    for (n in 1:N) {
      y[d, n] ~ normal(a[d] + b[d] * year[n] + epsilon[d], sigma_annual);
    }
  }
}

which yielded this

(the red colors are simply 2D histograms of the posterior samples of a, b, and \varepsilon, and a simple histogram of sigma_annual, the black lines are the prior means of each of the variables)

and

(the empty stripes are years where there are nan values in the observations, which threw an error, so I excluded them)

The results, however, are like a static SST field, epsilon does not represent the random variability from year to year as intended (and sigma_annual cannot do that reasonably either). So that is why I tried to have \varepsilon to be calculated for each n separately.

Regarding sigma_annual, that variable I only implemented because I couldn’t figure out how to sample from epsilon directly for each y(d,n) in the nested loop. I first thought of a truncated normal, and after googling that, I didn’t think I could implement it, so I went with something else that is positive and decays for larger values (though it looks like the hyperparameters I chose didn’t do any good).

I couldn’t figure out how to vectorize in my case. I don’t yet know what is possible, what is not. For example, in your example you sample for y[d] but d seems to me like its not defined, so I wouldn’t have thought of this as possible to do. I’ll try this too later today!

Its probably me just confusing what the real problem is. Anyway, I tried a few things there, and I always get errors back. In the version here, I think Stan would complain that y is not from this block and cannot be written directly into. But y is my data, so I don’t want to re-allocate y in the model block, do I?

I also tried to sample directly a multi_normal, I don’t have the exact version that I tried anymore, but something like this

vector[D] a;
a ~ multi_normal(mu_0, K1);
for (n in 1:N) {
    y[n] ~ multi_normal(a, sigma_AR1); // for testing just a and not a + b * year[n]
}

but even with vector[D] a; declared, it would complain that a is not a vector but an array (because of the sampling a ~ multi_normal(mu_0, K1); transforming it into an array with length of the draws, I assume?)

Well, all of this is down to me not knowing my ways around Stan just yet.

Thanks, going to do this!

early stage code to get a feel for stan

Now, here just my starting point before I tried to implement the ar1:

data {
  int<lower=1> D;
  int<lower=1> Y;
  array[Y, D] real y;
  array[D] int<lower=1> doy;
  array[Y] int year;
  vector[D] mu_0;
  array[D] real<lower=0> sig_0;
}
transformed data {
  matrix[D, D] K1 = gp_exp_quad_cov(doy, 1000.0, 10000.0);
  matrix[D, D] K2 = gp_exp_quad_cov(doy, 1000.0, 10000.0);
  for (d in 1:D) {
    K1[d, d] = K1[d, d] + 1000.0 * sig_0[d];
    K2[d, d] = K2[d, d] + 10000.0 * 0.1;
  }
}
parameters {
  vector[D] a;
  vector[D] b;
  real<lower=0> sigma;
}
model {
  a ~ multi_normal(mu_0, K1);
  b ~ multi_normal(rep_vector(0, D), K2);
  sigma ~ lognormal(0, 100);
  for (dd in 1:D) {
    for (yy in 1:Y)
    y[yy, dd] ~ normal(a[dd] + b[dd] * year[yy], sigma);
  }
}

Topic		Replies	Views
Model specification for hierarchical AR(1) and Ornstein-Uhlenbeck process Modeling hierarchical-model	14	4825	December 25, 2020
Structural time series with seasonality, sampling very slowly Modeling fitting-issues , performance	16	2949	July 1, 2021
ARMA(1,1) as part of hierarchical model term Modeling rstan	16	1113	March 19, 2021
Spatio-Temporal Model Modeling	11	4006	February 8, 2018
Simple model that converges in JAGS but not in Stan Modeling	18	1629	June 9, 2019