Hmm, I only ever tried adjoint-Adams at the very beginning, and I think it was roughly equivalent to bdf. But these were very non-exhaustive tests, as my computational resources are limited.
I’m still not sure what the appropriate performance metric should be.
Sure, ideally we want the highest Neff/s and convergence, but for what problems? For simple problems (tight priors) any method/configuration appears to work reasonably well. Then we could look at only the scaling with the number of states/parameters.
But, problems with tight priors are kind of uninteresting, aren’t they? If we already know the true value up to .1/1/10 %, then what’s the point? However, if the priors are not tight, then we may potentially get large runtime variations and stuck chains, and then how do we compare the performance?
As an example, for this (slightly modified) SIRP model (P=pathogen) with very tight priors (.1%), rk45 still performs best, but with bdf and the best (coarsest) adjoint configuration being almost equal.
This is with just one compartment. Black is regular warmup (what interests you), green is the incremental warmup (better, but you may ignore it). As we widen the priors (not run yet) we would expect larger and alrger variation in the runtimes for the regular warmup. Is this then still informative?