Fitting ODE models: best/efficient practices?

Inspired by @charlesm93 's recent talk on fitting ODE models in Stan (see e.g. Bayesian Model of Planetary Motion: exploring ideas for a modeling workflow when dealing with ordinary differential equations and multimodality | planetary_motion.utf8), I was wondering whether there exists a collation of best practices/tricks to more efficiently fit ODE models in Stan?

Just for the above example (planetary motion of a single body with unknown mass and initial positing around some object) a few things come to mind.

For example, one could limit the model to the first few time steps at first (during warmup?) , to not waste so much time on regions in the parameter space that are hopeless.

Likewise, I assume we might sacrificing accuracy of the numerical solver to speed up the computations. On the other hand, this will surely impact the performance of the HMC method. However, even for moderate accuracies, the flow of the system should not actually change too much. So there might be some potential there, at least for the warmup phase?

I’m sure other people have thought about this longer and with more background knowledge. Is there maybe a collection of resources or papers that I should know about /look at?

I hope this is the appropriate place for this question!



@wds15 @yizhang @bbbales2 have fun ;-)

1 Like

We had several round of discussion on this and I think the only actionable recommendation for general users is to try and profile.

one could limit the model to the first few time steps at first (during warmup?) , to not waste so much time on regions in the parameter space that are hopeless.

I’ve tried this, and the peril is the shortened model may not provide sufficient likelihood information to influence the posterior, leading to inaccurate warmup, or even making warmup taking longer time.

However, even for moderate accuracies, the flow of the system should not actually change too much.

There is no guarantee on that. There are many ODE systems evolve dramatically differently according to how accurate we solve them.

In general controlling ODE solver with sampler performance in tandem is not fully studied. The latest manifestation of this uncharted water is @wds15 's work on adjoint ode solver where we simply don’t know how controls will affect the performance, and the next best solution is to release an experimental version so people can try it.


There is no guarantee on that. There are many ODE systems evolve dramatically differently according to how accurate we solve them.

We have a method with @bbbales2 and @avehtari where we can use a not-so-accurate but fast method during HMC and then use importance sampling to get draws from the “true” posterior (one obtained when using accurate but costly solver). Here is a draft version of the case study:


I’m working on updating this and doing more experiments with also runtime informations. But I have also thought a lot about how we could avoid the bad parameter regions. ODE models (with adaptive solvers) are computationally really different than for example regression, because the computational cost of one iteration heavily depends on the parameter values. So you really would want to avoid the bad regions (good initializations?) or get out of there quick. One thing that is not discussed so much than tolerances is the max_num_steps parameter. I have observed that setting it to a very low value like 1000 can often help, because bad proposals can be rejected quickly.


There are two types of bad regions: the kind that we don’t need to explore so we want to get out of asap, and that we want to explore but our integrator is stressed.


@jtimonen Thanks, this looks great! Had a quick glance and will most certainly look further into it.

@yizhang clearly you cannot have guarantees for qualitatively similar behavior for all ODEs, but if the ODE is very badly conditioned, it is also clear that a certain degree of accuracy is necessary. However this is most certainly not the case for all/most ODEs…

Would you happen to have examples of these badly behaving ODEs?

How to diagnose if your max_num_steps is too low is a more difficult question. I guess that for some models it could bias the inference too, if the high-probability parameter region is where it is difficult to solve the ODE.

Edit: I see this was already noted

Yeah just because the solver takes a lot of time, this doesn’t mean that the region is unimportant. Quite then contrary may be the case!

A simple idea to get rid of bad neighborhoods would be to start a lot of chains initially, and then prune the ones that have low posterior densities and low volume.

Relying on the work that the solver needs sounds like a strategy that is bound to blow up back at you.

A paper that is conveniently named for this discussion.


I think that the classical ODE solvers are not designed MCMC in mind, but rather for simulating the dynamics with fixed parameters. On the other hand, Stan is probably not designed ODE models as the primary application. Therefore there could be room for improvement if one tried to create either a solver that is designed to be used with HMC or sampling method that is designed for ODE models.


Yes, that is my impression exactly.

@yizhang Thanks, I’ll have a look!

In problems like these: I consider it impossible to fit the parameters in any reasonable time if we don’t have very strong prior knowledge of them, even if there was a good amount of data with little noise. Some of them just change their dynamics very much even with the tiniest change in parameter values, so the posterior is crazy.

1 Like

Glad you mentioned this paper. It has brought many classical problems to modern test setting, and it highlights some of the hardship we’re faced with: there’s numerical instability when we solve ODEs, but more importantly there are ODEs that are intrinsically unstable. For the latter, we cannot hope perturbation will keep future solution close. Moreover, such systems are not easily detectable through usual numerical measures: examples like Vinograd system can have constant negative eigenvalues but still blow up, and Vinograd system is not even nonlinear.

So maybe if we want to take the path of perturbation there’s going to be some constraints on applicable problems. Some unstable ODEs can still be checked numerically, but they are exceptions rather than norm(an example come to mind is nonautonomous linear ODE with periodic coefficient, where one can apply Floquet theory should one be able to find Floquet exponents).

Sorry, what is the Vinograd system?

Edit: Ah, sorry, I was confused. Found it (p. 41,

Eq. 1.5 in

I found that one at first, but that system is nonlinear.

oops, posted a wrong link. You can find Eqn 10 here .

I don‘t think that the state of affairs is so bad, but here is what I learned so far:

  • Avoid ODEs if you can! For example, in PK/PD systems we input drug using functions which themselves are analytically solvable or you can make - for convenience - the input function being represented as a ODE. Don‘t enlarge the ODE system unless you have to!
  • Avoid to have more parameters than necessary. Sounds simple, but it is important to keep the number of ODEs being solved down (for each parameter you end of solving N more ODEs if N are the number of states).
  • Should your solution approach 0 then definitely tune the absolute tolerance to values large enough so that the ODE solver does not derail and for your problem setting to still make sense.
  • The relative tolerance should be increases as well in a similar way.
  • Don‘t make the tolerances larger than 10^-4 is what I found in my problems.
  • A good resource for tuning tips of the tolerances is the CVODES user manual
  • Scale the states such that unity is a sensible number (step-size tuning starts off with this number as a reference scale basically)
  • I found in PK/PD problems that using an adapt_delta of 0.7 (so rather low) gave me improved performance. The net effect of that are large step-sizes which did help move the sampler around more quickly.
  • Know your problem well in the sense that non-linear systems can easily be quite nasty and cause bi-modality. You need to avoid that by using clever parametrisations and good priors. This is the usual advice to have a favorable „geometry“ of the problem, but it is even more important in ODE problems.
  • I did myself in the past consider low max_num_steps a good thing to stop the sampler from going into crazy parameter regimes, but I changed my mind by now. It‘s far better to let the sampler get the value of the log-lik at these extreme values which will let the sampler „know“ that this is no-mans land.
  • Using a stiff sampler for warmup (bdf) and then switch for the sampling phase to a non-stiff solver (RK45) has been shown to be good for some problems. In fact, you can actually switch to the non-stiff solver starting with Phase III of the warmup.

That‘s just a collection of my personally collected experience with these problems.

If you are aiming to solve large problems with many states and/or parameters, then you should be looking for the adjoint ODE solving method for which I am about to share an experimental version of it, but it will still take some moments.

I hope that helps.