Slow sampling in simple ode

Hi all!

I am trying to fit what I believe is a simple ODE model:

functions {
  vector diffeq(real t,
             vector y,
             row_vector[] theta) {
    
    vector[14] dydt;

    for (i in 1:14) {
      dydt[i] = theta[i] * y;
    }
    
    return dydt;
  }
}
data {
  int<lower=1> T;
  vector[14] y[T];
  vector[14] y0;
  real t0;
  real ts[T];
}
parameters {
  row_vector[14] theta[14];     // Individual parameters
}
model {
  vector[14] mu[T] = ode_rk45(diffeq, y0, t0, ts, theta);
  
  for (i in 1:14){
    theta[i] ~ std_normal();
  }
  
  for (t in 1:T) {
    y[t] ~ normal(mu[t], 1);
  }
}

With data and sampling commands:

data = {"T": 19,
              "y0": df[col_names].values[0,:],
              "y": df[col_names].values[1:,:],
              "t0": df["t"].values[0],
              "ts": df["t"].values[1:],}

posterior = stan.build(model, data=data)

fit = posterior.sample(num_chains=1, num_samples=20)

The model I am trying to fit is simply \frac{dy_{i}}{dt}=\theta y for all i.

However, for a single chain and 20 samples, Stan is taking more than 15mins to sample, time after I stop it because I have never had such slow sampling. I am almost sure that the problem comes from the coding. Can anyone see any problem in my model?

I am fitting in in pystan version 3.2.0 on a MacOS.

Thank you very much!

I’m confused how this can compile. Is this supposed to be a scalar product?

Oh, sorry, my bad, theta is an array of vectors.

Can you not use a matrix exponential?

@Funko_Unko thank you for your answer. I not aware of a matrix of exponentials, can you elaborate on that, please? If you are thinking of a matrix data structure, according to the manual it is not an efficient data structure.

Sure, I think this should give an introduction with an example: Stan User’s Guide

I wouldn’t worry about this, especially not if your array of vectors behaves exactly as a matrix should.

(Also, I don’t think a matrix should be inefficient? Would you happen to have a link?)

I saw it here: 16.1 Basic motivation | Stan User’s Guide

1 Like

About the matrix exponential: The problem is that I am currently trying the most basic instance of my real model. At some point I will add terms that will make the ODE non separable. But I don’t want to move on until the most basic form works. But thank you for the tip, I will definitely keep in mind.

@Funko_Unko is there any chance that the speed of the sampling is due to the spacing in the time steps? I have something like:
[0.0, 0.2, 0.5, 1.0, 1.5, 2.1, 3.0, 4.0, 5.3, 7.0, 9.1, 11.7, 15.0, 19.2, 24.4, 31.1, 39.4, 50.0, 63.2, 80.0]

No, I wouldn’t think this to be an issue. If I were you I’d try the matrix exponential for now for this problem, to troubleshoot whether it’s a problem with the ODE solution procedure or the posterior geometry.

If it works with the matrix exponential, I’d guess you might have to change to ODE solver. With 14x14 parameters, you might need to use the new adjoint ode solver. Then you might also have to use a solver for stiff problems, but you will have to experiment there.

1 Like

Thank you for your time, I will try different things then.

@Funko_Unko as a quick follow-up, I tried what you suggested (using matrix exponentials) and the sampling goes way faster. However, the posteriors look a bit ugly:

1 Like

Hm, I wouldn’t have expected it to look that weird, but I would also not have thought it to look nice. You have a lot of parameters, and I’d assume many combinations of them yield similar results.

Maybe you need tighter priors, or more and better data. I’d try how things look and evolve starting at a lower number of parameters.

@Funko_Unko . Yes, the goal in the end is to do a hierarchical model. Since I have M measurements of the same model under slightly different conditions. I will try different priors (still with the exponential matrix) and maybe then move to a different ODE solver as you suggested before. I will keep posting here some partial results, in case you are interested in the evolution. Thank you so much for your help!

1 Like

That would be great! Good luck!

1 Like

I just tried the basic model again (the one without exponential matrices) and the bdf solver, but unfortunately is taking very long again.

Yes, the “regular” ode solvers should scale cubically in the dimension of your ODE state (because you have N states and N^2 parameters).

The new adjoint solver should scale just quadratically (as N+N^2). Might be worth to give this a shot.

1 Like

I recall the adjoint only scaling linear in states and parameters…but just try it, indeed.

Yes, but for this application, the number of parameters is the number of states squared.

Oh, I see….so yeah with n being something bigger than 2 or 3 …then adjoint should be your friend.

Cool that this stuff is picked up, I wasn’t sure if it’s too domain specific.

Thanks both for your advice! I was able to code the adjoint method using the signature in here: 13.7 Adjoint ODE solver | Stan User’s Guide , and the default 10^{-6} for both absolute and relative tolerance, and 300 max_num_steps (I don’t know whether I am in the correct ballpark here). And now I am getting an initalization error. I will play around with the priors a bit to see whether I can fix it.

I am trying to do a prior predictive check now and realized that 300 maximum steps might be too little… With 1800! seems to be working better. And I definitely need to adjust the priors on the thetas.

1 Like