Fitting ODE models: best/efficient practices?

wds15 · March 2, 2021, 5:23pm

Yup, this is what I figured in that STALD is designed for very specific applications which we do not really target.

Instead I have built in a sparse matrix thing to try out for now. Not sure if it will make it to the exposed thing. Sparse matrix in the sense that the Jacobian is thresholded for very small entries leading to (hopefully) faster LU solves. Right now I can’t find a problem size large enough yet where I see a gain, but I could have messed up coding it… not sure ATM.

yizhang · March 2, 2021, 5:48pm

Yeah I’ve considered the option for Torsten before quickly (maybe too quick?) turning away. Several factors play here:

The sparsity of the matrix. In general matrices with <50% sparsity will have hard time to gain advantage, though it really depends to the sparsity pattern and
The implementation that matches the pattern. There needs to be a sparse solver and matching storage set up to take advantage of the sparsity, and
The hardware. If the memory bandwidth is sufficient enough for a dense matrix turnout, such as when the matrix is small(<100 x 100 as in most Stan’s ODE problems), dense format will almost always more efficient.

Funko_Unko · March 3, 2021, 11:03am

I also want to weigh in on this with my (frankly unqualified) opinion.
Non-toy PDE problems (>1 spatial dimension, >1 field variables, nonlinear problems) would generally need much more infrastructure than Stan currently provides,
such as sparse matrix algebra, iterative solvers with preconditioners or unstructured mesh handling.
Interfacing with external solvers appears much more productive.

Funko_Unko · March 3, 2021, 11:19am

On a related note, is there some kind of library of ODE test cases?

yizhang · March 3, 2021, 4:07pm

Very good question. Such a thing would be of great value as IMO our current test cases are far from sufficient. This post shows the ODEs that are routinely used in development and you can see it’s a short list. There’s a plan to expand it systematically, but first we’ll need to tie some loose ends, such as ODE testing cleanup · Issue #2395 · stan-dev/math · GitHub. Help wanted!

Funko_Unko · March 3, 2021, 8:37pm

How can I help? As in what is the best way to start?

Don’t know R, know Python, used to know C++11, AD and HPC, know slightly more about ODEs and integrators than about HMC, about which I know very little.

Read of this campfire thing you guys were working on? That and ODEs would interest me.

yizhang · March 3, 2021, 9:43pm

That’s fantastic, thank you! There’s actually an ongoing discussion within @SGB on how we can regularly update our roadmap like what Rust has been doing, to help planning and bringing in new contributors. While this effort is underway, things that come to my mind that might interest you include

A friendly user guidance on how to select ODE solvers and control parameters, e.g. here and here, @bbbales2 will love this.
Auto-selecting ODE integrators and stiffness/non-stiffness detection.
delay-differential equations.
As to cross-chain warmup aka campfire, currently there’s an experimental MPI implementation in Torsten, a Stan fork for pharmacometrics applications. To bring it into Stan, there needs to be
- more benchmark,
- refactoring Stan’s MPI infrastructure (or a multithreading implementation based on @wds15 's work on TBB)
- refactoring the HMC sampler
or generally anything mentioned in this thread

I myself don’t know much of R either, and only slightly better in that regard on c++. Fortunately those things are frequently consulted on this forum and our community has been very helpful.

EDIT: forgot to mention that this issue could be a good entry to Stan Math underworld.

bbbales2 · March 3, 2021, 10:09pm

Another thing, helping out with the adjoint ode solvers stuff would be good: https://github.com/stan-dev/math/pull/1905

The place to start is reading and reviewing @wds15 's design doc: https://github.com/stan-dev/design-docs/pull/37

The advantage to that is it’s something we want to get done for the next release and @wds15 will do most of the legwork.

It is not necessary you catch every problem or anything or spend a million hours reviewing it – just get feedback to @wds15 if something doesn’t make sense or seems weird or doesn’t work (it’s very easy to miss things, and catching things late leads to panic’d releases). Also it’ll all get double checked before it gets merged.

Edit: Well, we want to get it done for next release if it turns out to be a good idea which it seems like at this point it is, but I’ve really not been paying attention which is why people paying attention is useful*

Funko_Unko · March 3, 2021, 10:52pm

Thanks, I’ll have a look!

aakhmetz · March 4, 2021, 3:05pm

I have seen some PDE topics in pymc3 community, but they also look quite pre-mature (maybe I am mistaken)

rok_cesnovar · March 4, 2021, 5:11pm

Yeah, @IvanYashchuk has also posted about PETSc-based partial differential equations solving in Stan on Discourse as well: Using PETSc library together with Stan

wds15 · March 4, 2021, 8:58pm

Thanks for the advert!

Here are instructions on how to get the shiny new ode_adjoint_tol and ode_adjoint_tol_ctl up and running:

Things should be settling down, but this is definitely stuff which is going to change and we need feedback here.

Funko_Unko · March 4, 2021, 9:29pm

I’ll try this out for now. May look at the other stuff later.

wds15 · March 5, 2021, 2:12pm

Thanks @rok_cesnovar !!! It’s now as easy as this to try out the new adjoint solver:

cmdstanr::install_cmdstan(release_url = "https://github.com/rok-cesnovar/cmdstan/releases/download/adjoint_ode/cmdstan-adjoint-ode.tar.gz", cores = 4)

library(cmdstanr)

example_model <- file.path(cmdstan_path(), "examples", "linked-mass-flow", "linked-mass-flow.stan")
base_data <- list(rel_tol=1e-6,
                  abs_tol=1e-6,
                  max_num_steps=10000,
                  adjoint_integrator=0,
                  system_size=2,
                  num_obs=8,
                  sigma_sim=2.0,
                  sigma_y=0.2,
                  # dont trust the args below
                  num_checkpoints = 10,
                  solver_f = 1,
                  solver_b = 1)

mod <- cmdstan_model(example_model)
fit <- mod$sample(base_data)

Any discussion can happen on the design doc pr, I think. There is a list of things what comes to my mind of what is needed:

charlesm93 · March 5, 2021, 9:52pm

I’m late to the party but I’ve enjoyed reading the discussion so far. Thanks for igniting the conversation @Funko_Unko .

A few weeks ago, @bbbales2 told me about adding recommendations for tuning ODE integrators in the Stan manual. It feels awkward to write recommendations when there are still many unanswered questions – and in certain cases, we still need to work out what the right questions are. Still, there are some useful heuristics and proven strategies, as this thread demonstrates.

Understanding Stan’s underlying mechanics can help avoid pitfalls – does that count as best practices? --, one example can be found in the recent case study on disease transmission. Section 5 resonates with some of the discussion happening here.

There’s also more work on switching ODEs during HMC phases, as teased during the presentation I gave last week and motivated by @wds15 's idea. And other things happening on other fronts.

charlesm93 · March 5, 2021, 11:00pm

Might be worth starting a new thread about the prototype adjoint solver. The linked-mass-flow example in the adjoint branch returns divergent transitions, unless I crank adapt_delta to 0.99 (which takes about twice as long to run). Also, it’s worth running things in parallel, so a command like

fit <- mod$sample(base_data, chains = 4, parallel_chains = 4,
                  adapt_delta = 0.99)

But then the chains are not mixing, \hat R \gg 1…

Maybe this conversation is already taking place somewhere else?

Funko_Unko · March 6, 2021, 9:34am

That’s integrators, isn’t it?

I hadn’t actually looked at the (most important) question whether “the model” converges. If our first scalable example doesn’t, then that’s another reason for some principled test suite.

What’s the (arising) “consensus” here? I recall reading (on here) something to the extent that data driven rankings are difficult, but I can’t recall where exactly or what the argument was.

wds15 · March 6, 2021, 12:46pm

You should start the example with init=0 then it did work for me at least.

I have already run SBC over this thing and it did come out just fine.

I plan to announce the prototype in a separate post which will hopefully catch even more users attention. We should aim to close the testing phase this month. Then this leaves enough time before mid May we release the next Stan version. This sounds like plenty time to get this thing straight.

rok_cesnovar · March 6, 2021, 12:50pm

Agree on a seprate post.

If any of you have more examples, feel free to push the model & data to the ode_adjoint cmdstan branch.

Funko_Unko · March 13, 2021, 9:20am

I have found the following things reasonable:

Using a dense metric appears to significantly improve every metric of interest. I guess that is because there may frequently arise correlations between variables in the posterior?
Playing with the tolerances may lead to catastrophe or to salvation.
The Adjoint ODE Prototype - RFC - Please Test appears to become competitive very quickly.

Furthermore, for the way I set up my toy models, the following worked out exceedingly well, significantly reducing walltime and runtime differences between chains and increasing ESS:

Instead of a single run using all of the data, warm the model up in chunks. I.e. take 1,2,4,8,16,… timesteps, and for each run perform a short warmup phase with adapt_init_phase=25, adapt_metric_window=100, adapt_step_size=0, iter_warmup=125 and initialized with a metric computed from the (relevant) warmup samples from all chains from the previous run. For the final run first do the above, compute the metric, and then do the sampling preceeded by the standard window to adapt the step size.

Is there anything subtly or not so subtly wrong with the above?

Topic		Replies	Views
Case study: ODE-based models and multimodality Publicity ode , case-study	12	1150	February 25, 2021
Advice to improve fit using ODE solver Modeling	1	718	August 1, 2017
Need advice about debugging "stuck in warmup" behavior Modeling rstan , fitting-issues	3	515	May 17, 2021
Model Calibration Modeling fitting-issues	4	904	May 15, 2017
Adjoint ODE Prototype - RFC - Please Test General ode	41	2756	April 7, 2021

Fitting ODE models: best/efficient practices?

Related topics