ODE fit not working/sampling

I am planning on implementing a kalman filter for a dynamic model and I’m working my way up from an ODE to make sure the basic results can be validated with simpler models and simulations. I went through the simple harmonic oscillator example, and tried to implement a almost as simple SIR disease transmission model. The differential equations are given by

\displaystyle \frac{dS}{dt} = -\frac{\beta}{N} I S
\displaystyle \frac{dI}{dt} = \frac{\beta}{N} I S - \gamma I
\displaystyle \frac{dR}{dt} = \gamma I

Stan code is at the end of the post.

With fixed parameters I get the expected output:

SIR incidence, pred

However, when trying to fit the incidence term (\beta / N) I S to simulated data Stan takes a long time and either fails with several warnings like Exception: integrate_ode_rk45: initial state[1] is inf, but must be finite!, or takes a long time and the estimates are completely wrong. In some cases it will say the poisson rate is negative or zero, but even with quick fixes like iterating over that array and replacing that with a small positive value the other problems persist.

Oddly, when running several chains one or two may actually sample properly. Also if trying to fit I or using normal for the likelihood may have better results, but I don’t see why it should matter. I also tried changing the ODE integrator, but that seemed to make no difference.

I left the priors flat on purpose for now; this shouldn’t be an issue if everything else is working. Is there something else I may be missing or should check?
I’m using CmdStanPy, and ran a few CmdStan chains as well with similar outputs.

Thank you.

functions {

  real[] SIR(real t,        // time
             real[] v,      // state
             real[] theta,  // parameters
             real[] x_r,    // data (real)
             int[] x_i) {   // data (integer)
    real dvdt[3];
    dvdt[1] = -(theta[1]/theta[3])*v[2]*v[1];  // dS/dt = -(beta/N)*I*S;
    dvdt[2] = (theta[1]/theta[3])*v[2]*v[1] - theta[2]*v[2]; // dI/dt = (beta/N)*I*S - gama*I
    dvdt[3] = theta[2]*v[2]; // N - S - I;  // dR/dt = gama*I
    return dvdt;


data {
  int<lower=1> T;
  int<lower=0> incidence[T];
  real t0;
  real ts[T];

transformed data {
    real x_r[0];
    int x_i[0];

parameters {
  real<lower=0> S0;
  real<lower=0> I0;
  real<lower=0> Ro;
  real<lower=0> gama;
  real<lower=0> N;

transformed parameters {
    real beta = Ro*gama;

    real theta[3];
    theta[1] = beta;
    theta[2] = gama;
    theta[3] =  N;

    real v0[3];
    v0[1] = S0;
    v0[2] = I0;
    v0[3] = N - I0 - S0;  // recovered_0

    real v_pred[T,3] = integrate_ode_rk45(SIR, v0, t0, ts, theta, x_r, x_i);

    vector[T] susc_pred = col( to_matrix(v_pred), 1);
    vector[T] inf_pred = col( to_matrix(v_pred), 2);

    vector[T] inc_pred = (beta/N)*inf_pred.*susc_pred;


model {
    incidence[t] ~ poisson(inc_pred);

It does not look like you are respecting the constraints. Are the recovered ones really always positive in this formulation?

What do you mean by constraints? The output of the ODEs is suppose to be always positive, except for integration errors when near zero. Like I said when that happened I included a loop to eliminate those values to see if that was the problem, but still got some version of the problem.
The json data (as txt file is attached).

sir_poisson.txt (4.1 KB)

v0[3] = N - I0 - S0; // recovered_0

is v0[3] always positive during sampling?

I see. I started with valid conditions, and assumed the sampler would “just not go” to regions where that constraint was violated, S + I + R = N at all times and there’s a strong prior for N, which is why it’s formulated like that.

If that is the cause of the problem, is there a way of ensuring the recovered equals N - I_0 - S_0 other than something like max(0, N - I0 - S0), and would the truncation affect the sampler, or is it more efficient to just set that to a value that would be rejected directly?


Getting the constraints right is very important to get efficient sampling… and don’t do that with truncation either by using max type of functions.

I would look into using simplexes. There is a pre-defined Stan type for this. This is what I would try.

What you can also try (but I doubt it is efficient), but still… you can declare N as

real<lower=0,upper=S0+I0> N;

Stan accepts these constraints which are dynamic… but that won’t be efficient I am afraid, but worth a try given that you have the code like this already written.

To learn about the parametrisation being a good one, you should do pairs plots of the posterior.

Thanks. I know truncating is a bad idea, but I assumed that would only happen in the (statistically) unlikely case that S_0 + I_0 > N. The parameterization is pretty standard because of the prior information on N, beta and gama, and in general this can be estimated with Metropolis-Hastings or MLE without much difficulty.

If for some reason this is happening more often than acceptable maybe the dynamic constraints are the easiest solution. Simplex types will require further transformation, if I understand correctly.

I’ll give those a try, thanks.

Stan is a sampler and is trying out lots of values which are totally off from where you expect it.

The key to efficient Stan programs are good parametrizations which work on an unconstrained space. The special types provided by Stan all do automatic transformations for you to an unconstrained space (lower=0 is mapped via a log transform, etc.).

I understand. Unfortunately the model parameters are constrained by physical reality, so the sampler needs to be able to adapt itself to that.
I tried the dynamic constraints and there’s little improvement, I’m getting errors like Exception: integrate_ode_rk45: initial state[2] is inf, but must be finite!. Not sure where that’s coming from.
I don’t see the documentation on the simplex type, but I’m also not sure how I can put good priors on the parameters beyond the constraint of summing to unity.

Is ODE fitting supposed to be production ready in Stan, or is it still some kind of beta? Thanks.

ODE fitting is production ready… but slow…

Well, you really have to come up with a good parametrisation which automagically respects the physical constraints. That’s really an art.

Sum-to-zero constraint:

Ok. I’d seen the first page, will check out both more carefully.

Maybe that’s worth the effort for larger models, for the simpler stuff it may be easier to revert to Metropolis-Hastings, which will just reject bad samples outright instead of leapfrogging through long trajectories that will get rejected anyway. Thanks.

Often you learn a lot about your model if you are able to come up with a paremtrization which is

  • computationally efficient
  • interpretable
  • respects your problem constraints

Only then you can set good priors and get efficient inference. To me that’s part of the modelling “process” if you will.

I agree for the most part. The issue here is the inference is all of the above when using Metropolis-Hastings, for instance (or even MLE), but maybe moving to HMC requires a few other considerations, which I assumed would be straightforward at least to get around (e.g. by outright rejecting samples we know will not work from the parameter values alone ).

So while I agree with you, and will probably invest some time in the future to find an HMC implementation that works at least as well as MH, I will do so because inference with more complicated models may benefit from that. I think it’s unlikely that in general the additional work is justified otherwise.

That’s why Stan has so many very clever defaults - but you have to know about them and use them.

(our manual has a search function for good reasons as it is admittley huge).

But yeah, Stan is probably never a quick and dirty solution if your problem is not off the shelf.

Not exactly what I mean, but I guess we exhausted the discussion on which options are available or not. Thanks for the help.