How to add autocorrelation of observation errors?

Hi all,

I have a two observed time series (Y1 and Y2) that I am trying to fit an ODE model to. The time series are at an hourly time step over multiple days. I use the Euler method to get the estimated values (Y1_hat and Y2_hat). Each time series has observation errors. I can get that to work fine. However, I want to also add autocorrelation of the errors to the model, as well. I am having a lot of trouble figuring this out. A bare bones version of the model is shown below. (FYI: I also use hierarchical hyperpriors for one of the parameters, but that’s not relevant to the question).

functions {
    vector Y1Y2_ODE(vector y, real A, real B, real C) {
      // Unpack states
      real Y1  = y[1];
      real Y2  = y[2];

      // Difference equations
      vector[2] dydt;
      dydt[1] = (A - B*C + C^2); // Derivative of Y1
      dydt[2] = (B*A + C - A^2); // Derivative of Y2
      return dydt;

data {
  int<lower=1> n_days;               // Number of days
  int<lower=0> n_obs;                // Number of total observations
  int<lower=0, upper=1> Y1_lik;      // =1 if Y1 is part of the log l-hood; else 0
  int<lower=0, upper=1> Y2_lik;     // =1 if Y2 is part of the log l-hood; else 0
  vector[2] y0_init;  // Initial states
  vector[n_obs] Y1_obs;              // Observed Y1 data
  vector[n_obs] Y2_obs;             // Observed Y2 data

parameters {
  // observation errors, unscaled
  array[Y1_lik ? 1 : 0] real<lower=0> sigma_Y1;  // Y1 obs. error sd
  array[Y2_lik ? 1 : 0] real<lower=0> sigma_Y2;   // Y2 obs. error sd

  // Parameters to estimate
  vector<lower=0>[n_days] A; 
  vector<lower=0>[n_days] B; 
  vector<lower=0>[n_days] C;

transformed parameters {
vector[2] y_hat = y0_init; // init. Y1 and Y2

  for (d in 1:n_days){
    for (i in 1:24){
      int t = i + 24 * (d - 1); // time counter
      if (t > 1) { // for each time greater than initial conditions, do euler
      y_hat += Y1Y2_ODE(y_hat, A[d], B[d], C[d]) * dt;
      Y1_hat[t]  = y_hat[1];
      Y2_hat[t] = y_hat[2];

model {
  // Prior distributions of parameters across days
  A ~ normal(0, 10); 
  B ~ normal(0, 10); 
  C ~ lognormal(log(10), 1);
  // Likelihood for observed data
  if (Y1_lik) {
    sigma_Y1[1] ~ normal(1, 0.3);
    Y1_obs   ~ normal(Y1_hat, sigma_Y1);
  if (Y2_lik) {
    sigma_Y2[1] ~ normal(1, 0.3);
    Y2_obs   ~ normal(Y2_hat, sigma_Y2);

generated quantities {
  vector[n_obs] F2;

  for (d in 1:n_days){
    for (i in 1:n_perday){
      int t = i + 24 *  (d - 1); // time counter
      // calculate F2 
      F2[t] = Y2_hat[t] * A + B;

To do this in MATLAB, say, I could define the following errors (just for Y1 for example), where \rho is the autocorrelation coefficient:

Err_{AR1,Y1}=Err_{Y1}(2:N)-\rho_{Y1} *Err_{Y1}(1:N-1)

And then add the autocorrelation directly in the loglikelihood function:

LogY1 = -\frac{N}{2} \log(2\pi) - \frac{N}{2} \log\left(\frac{\sigma_{Y1}^2}{1-\rho_{Y1}^2}\right) - \frac{1}{2}(1-\rho_{Y1}^2)\left(\frac{Err_{Y1}(1)}{\sigma_{Y1}}\right)^2 - \frac{1}{2}\sum\left(\frac{Err_{AR1,Y1}}{\sigma_{Y1}}\right)^2

I do not know how to do the same thing in Stan despite searching all over the web and reading the autoregressive sections of the manual. Can anyone please show me the best way to incorporate this into the model? I asked ChatGPT how to do it and it provided me with the following option, which I do not think is correct:

target += normal_lpdf(Y1_obs[t] | Y1_hat[t], sigma_Y1 * sqrt(1 - rho_Y1^2)) 
          + normal_lpdf(Y1_obs[t] - Y1_hat[t] | 0, sigma_Y1 * rho_Y1);

How do I do this? I would also appreciate any thoughts on how to improve code efficiency.

Thank you in advance for your help!

The short answer is that you can write the likelihood in MATLAB, you can just code it the same way in Stan. Specifically, write the code to calculate LogY1 and just do target += LogY1; to increment the log target density.

I’m a bit confused about your Stan code as you have variables Y1_hat and Y2_hat that are never declared.

In general, you can pull out the vector of residuals from a prediction and then give those a time-series model. For example, a simple case is

matrix[N, K] x;
vector[K] beta;
vector[N] E_y = x * beta;
vector[N] err = y - E_y;
target += time_series(err, ...);

For example, if I want the time series to be a simple random walk with scale sigma.

functions {
  function time_series_lpdf(vector err, sigma) {
    int N = rows(err);
    if (N == 0) return 0;
    real lp = normal_lpdf(err | 0, 2 * sigma);  // need something for first error
    for (n in 2:N) {
      lp += normal_lpdf(err[n] - err[n - 1] | 0,  sigma);
    return lp;

Any kind of time series can be included here.

ChatGPT was on the right track. It’s just trying to translate all your algebra into normal distributions, where log normal(y | 0, sigma) = -log(sigma) - 1/2 * y*2 / sigma^2 + const. and it’s trying to aggregate all N iterations into one more efficient -N * log(sigma) call. But you can’t trust it in these situations.

@Bob_Carpenter thank you for your reply, especially with regard to its generality. Yes, I forgot to declare Y1_hat and Y2_hat as this was just for demonstration. They should be vectors of length “n_obs”.

Despite your clarity, I am still having trouble coding this. Could you please be more explicit in how to code this in the Stan language? For example, you say that ChatGPT was on the right track, but can’t be trusted in these situations (which I agree with). However, you didn’t specify what was incorrect about the approach and how it could/should be written. I would greatly appreciate some additional clarity on the matter. Essentially, I’m just adding one more parameter to model: the autocorrelation (AR1) coefficient. This assumes the errors of the model are represented by an autoregressive model (i.e., not a random-walk). Apologies in advance if I’m being dense.

Thank you again for your time and help,

I think I’ve got it sussed out. Here is what I have:

  real time_series_ar1(vector err, real rho, real sigma) {
    int N = rows(err);
    if (N == 0) return 0;
    real lp = normal_lpdf(err[1] | 0, sqrt(sigma^2 / (1 - rho^2))); // First error
    for (n in 2:N) {
      lp += normal_lpdf(err[n] - rho * err[n - 1] | 0, sigma); // AR(1) error
    return lp;

transformed parameters{
vector[n_obs - 1] errY1 = Y1_obs[2:n_obs] - Y1_hat[1:n_obs-1];

target += time_series_ar1(errY1, rho_Y1, sigma_Y1);

That looks reasonable. I think it’s more natural to write the normal model this way:

lp += normal_lpdf(err[n] | rho * err[n - 1], sigma);

It’s the same thing added to lp, but it makes it clearer that the error for item n is a function of the error for item n - 1.

If you don’t want to save errY1 it can be defined as a local variable in the model block.

Hi again,

Thank you for your responses. The AR1 model converges, but the parameter estimates are considerably different than before and the model fit is much worse. I’d expected them to be about the same, but with more uncertainty. Moreover, the \rho that I fit in Stan is much higher (~0.9+) than what I fit in MATLAB (~0.5). Do you have any insight into why this may be or how best to diagnose the issue?

I appreciate your time!