My colleagues and I have been having some trouble with a model that uses the recently-released adjoint ode solver and would really appreciate some help.
The background is that we are making an application for fitting Bayesian statistical models of measurements of metabolic networks using cmdstan via cmdstanpy. The source code lives here and has documentation here. In order to connect our model’s parameters with its measurements we need to solve a steady state problem: given a set of kinetic parameters and boundary conditions, what concentration profile of internal metabolites will be stable? We solve this problem by representing the network’s dynamics as a system of ODEs and simulating for a reasonably long time from a biologically plausible starting state.
The ODE-solving step is by far the slowest part of our model, and involves more parameters than state variables, so we were hopeful that the adjoint solver would speed up our models. Indeed this is what we observed when we tested the new solver under cmdstan release cmdstan-ode-adjoint-v2. However, since the official release we have encountered a new type of error and bad sampler behaviour, which we did not see before.
Specifically, whereas under version cmdstan-ode-adjoint-v2 we are able to achieve more or less stable sampling, under versions cmdstan-2.27.0-rc2 and cmdstan-2.27.0 something seems to go wrong with the adaptation during the initial phase, with the result that the step size rapidly gets very small and almost all iterations have divergences. In addition we see the following error under versions cmdstan-2.27.0-rc1 and cmdstan-2.27.0 but not under version cmdstan-ode-adjoint-v2:
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: ode_adjoint_tol_ctl: ode parameters and data[3] is inf, but must be finite! (in '/Users/tedgro/dtu/projects/minimal_maudfail/model.stan', line 313, column 4 to line 358, column 40)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
I think this means that, somewhere, an ode state variable (i.e. the firstnon-function argument to ode_adjoint_tol_ctl
) is infinity, which seems like the kind of thing that would cause a problem! My first thought was that I needed to tighten some tolerances, but I didn’t manage to find a configuration that avoids the error, and this doesn’t explain the differences we see between cmdstan versions.
We have prepared an example setup here that reproduces this behaviour. Running make stan-environment
should download and build all the relevant cmstan versions (cmdstan-ode-adjoint-v2, cmdstan-2.27.0-rc1 and cmdstan-2.27.0) in the cmdstan
folder, then python script.py
will run one of our models under each version, with the results saved in the output
folder. The Stan input lives in the folder data
- for example you can see the ode tolerances that we used here. If you don’t want to adjust anything some results from my laptop (intel macbook pro running macos Mojave, cmdstanpy 0.9.76, Apple LLVM version 10.0.1 (clang-1001.0.46.4)) are already uploaded in the output
folder.
I’m posting mainly to ask the people who worked on this feature, and anyone else who knows about the adjoint method, whether you have seen this kind of behaviour or know why it might be happening. Did anything significant change between cmdstan-ode-adjoint-v2 and cmdstan-2.27.0-rc1?
Any comments, thoughts or questions would be very gratefully received!