Ordinary differential equation with autoregressive errors

I am using an ODE to model a toy logistic growth system. I know, however, that the model is persistently wrong, because it fails to account for certain aspects of the system.

One way to model this model discrepancy is to use an autoregressive error. For an AR1 error, the system has the following governining equations,

y(t) ~ normal(y*(t),sigma)
residual(t) = y(t) - y*(t)
residual(t) ~ normal(rho*residual(t-1),sigma1)

where rho measures how persistent the errors are from one period to the next.

I have coded this up in Stan as shown below. My issue is the following, since residuals are a non-linear function of parameters, then I need to do some sort of Jacobian transformation. It is not clear to me, however, how to do this with an ODE model (perhaps here, it is possible because the ODE has an analytic solution but I want to make this work for more general systems).

Does anyone know how to do this here?

Best,

Ben

functions {
  real[] bacteria_deriv(real t,real[] y,real[] theta,real[] x_r,int[] x_i) {
    real dydt[1];
    dydt[1] = theta[1] * y[1] * (1 - theta[2] * y[1]);
    return dydt;
  }
}
data {
  int<lower=1> T;
  real t0;
  real ts[T];
  vector[T] y;
}
transformed data {
  real x_r[0];
  int x_i[0];
}
parameters {
  real<lower=0,upper=2> theta[2];
  real<lower=0> sigma;
  real<lower=0> sigma1;
  real<lower=0,upper=10> y0[1];
  real<lower=0> rho;
}

transformed parameters{
  vector[T] vResiduals;
  real y_hat[T,1];
  y_hat = integrate_ode_rk45(bacteria_deriv, y0, t0, ts, theta, x_r, x_i);
  for(t in 1:T)
    vResiduals[t] = y[t] - y_hat[t,1];
}

model {
  sigma ~ cauchy(0,1);
  y0 ~ normal(5,2);
  for (t in 2:T) {
    y[t] ~ normal(y_hat[t,1],sigma);
    vResiduals[t] ~ normal(rho*vResiduals[t-1],sigma1);
  }
  theta[1] ~ normal(0.5,1);
  theta[2] ~ normal(50,10);
  rho ~ normal(0.8,0.5);
}

Can’t you just apply the chain rule and let the ODE system sort out its part of the bargain?

So I could solve the two sensitivity equation ODEs. In other words, if

dydt = f(theta,y),

and I take the partial derivative wrt the parameters, then reverse the order of differentiation, I obtain the sensitivity ODE,

d/dt (dydtheta_j) = dfdy dydtheta_j + dfdtheta_j

so we obtain a system of the form,

dS_j dt = J S_j + F_j.

where S_j = dydtheta_j, is the sensitivity of the solution with respect to parameter j. Since the residual is simply the difference between the solution and the true value, then the Jacobian transform is the just inverse of the determinant of the matrix of sensitivities, evaluated at each time point.

So, in other words, I solve for S_j for all j parameters then, for each point where I have an observation, I increment the log probability by the log of the Jacobian.

Does the above sound reasonable? Is this what you were thinking Bob?

Best,

Ben

just inverse of the determinant of the matrix of sensitivities

I think there’s an issue here of invertibility. The ode solver does give you y = f(x), but you don’t get x = f^-1(y) necessarily (and I think you need this). And the sensitivity matrix here is the wrong shape for the determinant Jacobian thing to work (it’s M outputs * N parameters where N doesn’t need to equal M).

y(t) ~ normal(y*(t),sigma)
residual(t) = y(t) - y*(t)
residual(t) ~ normal(rho*residual(t-1),sigma1)

I’m not sure what this means. Does this mean the system runs and then an autoregressive process comes back through and adds noise on top after the ODE? That doesn’t sound right. It seems like somehow it should all happen at the same time?

There’s the bit of the dynamics that the ODE explains then there’s the bit that the autoregressive eq. explains, I see the reasoning behind wanting to do that, but wouldn’t that look more like:

y(n + 1) = y(n) + integral(ode(t, y) from t(n) to t(n + 1)) + random_piece_to_infer

So that for each measurement, the state changed as a function of the ODE + some other process (however you decide to model that)?

Not that I really know what I’m doing here, haha, but hope it helps anyway!

Ben, you’re right – my approach doesn’t work because of the non-invertibility of the matrix of sensitivities!

I know what you mean by the AR1 process happening all at the same time (doing it separately is how this is handled in MLE). I think you’re probably right, I need to integrate between consecutive points. From the Stan manual section on ARMA models, I think I can code up this type of model.

However, this model will, nonetheless, require that we set priors on a non-linear transformation of the parameters (as before). This is because, in contrast to the ARMA model in the manual, the integral part is, in general, a non-linear function of the parameters. Also, as before, since the transformation is many-to-one, the Jacobian is not calculated by the determinant of the matrix I discussed before.

Anyone have any ideas how I can calculate this sort of Jacobian?

I may be misreading this, but from what I understand this kind of model should be straightforward if you think generatively. Solve the ODE for y_1(t) and then define an autoregression time series for y_2(t) and then define your model as y ~ N(y_1(t) + y_2(t), sigma). Or use splines or a GP for y_2(t).

No?

So I’ve tried Michael’s suggestion, although not sure whether I am understanding it correctly. This probably stems from the way I initially wrote down my model. For clarity, my model is of the form,

y(t) = y_hat(t) + e(t),
e(t) ~ normal(rho e(t-1),sigma).

Following Micheal’s suggestion, I have tried,

 for (t in 2:T) {
    y[t] ~ normal(y_hat[t,1] + epsilon[t],sigma);
    if(t==2)
      epsilon[t] ~ normal(0,sigma1);
      else
      epsilon[t] ~ normal(rho*epsilon[t-1],sigma1);
  }

But this isn’t the same, at least I don’t think it is, as my model. In particular, the model directly above assumes that there is further variability in y(t) that is not explained by y_hat(t) and epsilon(t), whereas in my model, there isn’t.

Am I missing something here?

Following the approach of MLE, to estimate my model, we would do something like,

for(t in 2:T){
    (y[t]+y_hat[t,1])-rho*(y[t-1]+y_hat[t-1,1]) ~ normal(0,sigma);
  }

But, from my discussion above, it is not easy to do, because of an awkward Jacobian…

You don’t want that extra level. The generative model for t > 1 is going to look like:

for (t in 2:T) {
  real err_t_minus_1 = y[t - 1] - y_hat[t - 1];
  y[t] ~ normal(y_hat[t] + rho * err_t_minus_1, sigma);  
}

I’ve just pulled the mean out of your e[t] sampling statement and rolled it into the sampling statement for y[t].