Adjoint ODE behaviour

Hi @stevebronder !

We have two models which are problematic reported from users. The one I posted in the PR is the one from @Teddy_Groves1 . I updated the comment where I post stan file, data and initial to also now include the missing functions.stan. Better take that one as this is the only example which I ran myself and confirmed the issues. The one from @bernadette-eu was not run by myself yet as problems surfaced with the other model already and I was able to fix them with the changes in the code which are in the PR.

I wanted to get back to the model from @bernadette-eu at some point.

Sebastian

1 Like

Hi @stevebronder ,

Good catch, itā€™s weird that this was not picked up in my machine while compiling the .stan file. I list below the steps followed, for reproducibility on the testing:

  1. Copied the files cvodes_integrator_adjoint.hpp and ode_adjoint.hpp to my cmdrstan 2.27.0 installation.
  2. Addressed the issue with the missing vector size for the data objects abs_tol_forward and abs_tol_backward. It should now read:
vector[n_difeq] abs_tol_forward;
vector[n_difeq] abs_tol_backward; 

Hopefully there is now an agreement with the guidelines presented in the Stan Functions reference doc and the Stan Users Guide doc.

  1. This updated .stan file is available in my repo. The init and data .json files lie there as well.

  2. Compiled the file Test_Adjoint_v2.stan on my machine. I get the following messages:



  3. Test run: receiving the error messages " Exception: ode_adjoint_tol_ctl: ode parameters and data[1] is nan".

Thanks for the great work.

EDIT: (Steve) Fixed link to stan file

Alrighty I think I got this working!

Can you try the below and let me know if it works? I made some slight changes to your stan program so please copy and paste this stan program into Test_Adjoint_v2.stan. That file has a couple different things

  1. Instead of parameter packing it uses the new ODE interface (which I think is a bit nicer to read and is also a bit more performant)
  2. There was an indexing error that is now removed ([n_obs+1:2*n_obs] should be written have parenthesis like [(n_obs+1):(2*n_obs)] (@rybern I wonder if we could make a warning about this in pedantic mode, it seems kind of common?)

3ā€¦ I think time < right_t[i] should be time <= right_t[i] in SIR? Else on the last iteration idt beta is set.

Speaking of (3), thank you twice for reporting this because I found a little bug in the stan compiler when users donā€™t set initial values for an object in functions! Because of (3) sometimes beta would not be set. I thought I was catching that in the compiler but itā€™s an easy fix.

Anyway, copy pasting the above and running the below should work!

# make a fresh cmdstan somewhere
# Note here I'm just doing it in the same folder as where I put Bernadette 
#  so they are next to each other
git clone --recursive https://github.com/stan-dev/cmdstan
# go down to stan math and checkout the fix branch
cd cmdstan/stan/lib/stan_math/
git checkout fix/adjoint-mem-ownership
# back to cmdstan folder
cd ../../
# This should be okay with gcc > 8 or clang
echo "CXXFLAGS+=-march=native" > make/local
make ../Bernadette/Test_Adjoint/Test_Adjoint_v2
../Bernadette/Test_Adjoint/Test_Adjoint_v2 data file="../Bernadette/Test_Adjoint/adjoint_data.json" init="../Bernadette/Test_Adjoint/init.json" sample output refresh=2 random seed=123

You could also just run upto the point of building cmdstan and then switch over to the cmdstanr pointing to the new cmdstan.

On my computer Iā€™m seeing

Gradient evaluation took 0.159679 seconds
1000 transitions using 10 leapfrog steps per transition would take 1596.79 seconds.
Adjust your expectations accordingly!


Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: ode_adjoint_tol_ctl: ode parameters and data argument number 3 at index  is inf, but must be finite! (in '../Bernadette/Test_Adjoint/Test_Adjoint_v2.stan', line 148, column 2 to line 163, column 101)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: ode_adjoint_tol_ctl: ode parameters and data argument number 3 at index  is inf, but must be finite! (in '../Bernadette/Test_Adjoint/Test_Adjoint_v2.stan', line 148, column 2 to line 163, column 101)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: CVODES: CVode At t = 0 and h = 3.7207e-102, the corrector convergence test failed repeatedly or with |h| = hmin. Error code: -4 (in '../Bernadette/Test_Adjoint/Test_Adjoint_v2.stan', line 148, column 2 to line 163, column 101)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
CVODES: CVode At t = 394 and h = -2.85893e-26, the error test failed repeatedly or with |h| = hmin. Error code: -3
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Iteration:    1 / 2000 [  0%]  (Warmup)
Iteration:    2 / 2000 [  0%]  (Warmup)
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '../Bernadette/Test_Adjoint/Test_Adjoint_v2.stan', line 195, column 23 to column 59)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Iteration:    4 / 2000 [  0%]  (Warmup)
Iteration:    6 / 2000 [  0%]  (Warmup)

So after the first few iterations the warnings go away which at least seems good. Though note I did not run this for the full number of iterations so you may still have divergent transitions.

1 Like

I thought the nonparenthesis version was standard? Iā€™ve never had an issue with indexing without parentheses and only notice the difference when I port a function into R where I do have to add them

Oh really? Huh oh maybe Iā€™m wrong on that one. tbh I just assumed it was like R and : had precedence over the other operators

Thatā€™s awesome @stevebronder !

Point 3 above on time <= right_t[i] is correct.

I have hopefully replicated these results in my machine. Hereā€™s what I have so far:

  1. Successfully recovered the branch:

  2. Checkout the fix branch:

  3. Compiled the stan model (Should I be getting these messages?) :

  4. Behaviour during the test run - similar to what you report:

  5. Currently executing with the full number of iterations (2chains, iter_warmup = 2000, iter_sampling = 5000). Based on the CPU time of my test run, I would expect it to run in about 1.5 hrs, but letā€™s see.

Btw, are there recommendations about the iter_warmup and iter_sampling so as to achieve reasonable mixing of the Markov chain? The desktop Iā€™m using contains 8 cores.

Thanks.

1 Like

Just an fyi, itā€™s a bit easier to read large text outputs if you put them into a https://gist.github.com/ and then send the link instead of a screenshot. Reading a screenshot in (3) that does seem a bit weird though I canā€™t see what is going on after the In file included. I donā€™t use windows so tbh Iā€™m not sure what mingw is expected to warn about.

The amount of warmup and sampling is probably a bigger q / off topic for this discourse post. But offhand you want to run as few warmup iterations as you need to get a nice rhat. I usually try to run for less than the default (like 500) and if my rhat looks bad Iā€™ll go to 800 or keep moving it up etc. Then a number of samples such that you get an effective sample size that allows you to have an estimate with precision within whatever monte carlo standard error you need (the ballpark is usually 1/sqrt(N)). Usually Iā€™ll try to do 500/500 when Iā€™m still prototyping a model. But if you look around the forums Iā€™m sure thereā€™s better advice than what Iā€™m giving here.

(your model seems pretty complicated so my answer above about warmup/samples may not be accurate / relevant!)

1 Like

@bernadette-eu So, what is it that you actually want to achieve in the end (with that model)?

@stevebronder Hi! I have performed the MCMC run using the fix branch. Data, init, stan model, csv files, outputs of the diagnose method and a graphical assessment of the fitted deaths are available in appropriately named folders here.

Setting: Chains = 3; Warmup iters = 500; Sampling iters = 500, max_tree_depth = 19, adapt_delta = 0.99.
CPU time: 8 hours for chain 1, in total 2.17 days. The parallel implementation did not seem to work here. Most of the transitions are divergent.

@Funko_Unko Hi! The objectives are:

  • Infer the true number of infections accounting for their age distribution, using mortality data.
  • Incorporate contact patterns in the transmission process; include contact uncertainty into the inference process.
  • Offer an estimate of the time-varying reproduction number.
  • Assess the effect of control measures and changes in behaviour.

@bernadette-eu we just released Stan 2.28.1 with the patch for fixing this bug!

EDIT: If you have more modeling questions related to the adjoint ODE solver itā€™s cool and good to have them in this thread. If itā€™s other modeling things or more general ODE discussion may be helpful for people searching in the future to make a new topic

5 Likes

Thanks, I was just reading the announcement. I will re-run with 2.28.1 under the patch and see how it goes.

1 Like

@stevebronder

Hi,

The results of the MCMC run under release v2.28.1 are available here. Also included are the outcomes of the ā€œdiagnoseā€ method.

Happy to discuss further on these, thanks!

Are you trying to infer the contact pattern from the infections? Or are you hoping to prescribe the contact pattern (eg via strong priors) and let this somehow ā€œstabilizeā€ the infection inference?

I think all of the other points can be handled ā€œas isā€ by the model proposed here: CoDatMo Liverpool & UNINOVE models (slow ODE implementation and Trapezoidal Solver) - #45 by Funko_Unko

Though I donā€™t know what other developments have happened since then, but I guess maybe @s.maskell knows?

Are you trying to infer the contact pattern from the infections? Or are you hoping to prescribe the contact pattern (eg via strong priors) and let this somehow ā€œstabilizeā€ the infection inference?

The second.

I think all of the other points can be handled ā€œas isā€ by the model proposed hereā€¦

I have revisited the ā€œsimplexā€ model you have offered in that post, as it has potential. Iā€™d like to get a better understanding of it but, after an online search, I cannot find a reference to such an approach.
I fail to understand how the transition rate between S and E_1, \beta \frac{I_1 + I_2}{N} S, is handled. See the Liverpool model description.

Letā€™s continue this discussion in the CoDatMo thread.

If someone has used and documented this approach before, then I donā€™t know about it.

@stevebronder @wds15 Hi!
Given all the nice work done for the patch now available in v2.28.1, I would be happy to hear other memberā€™s thoughs on where to focus the investigation on the use of the adjoint ODE solver.

First, I see that the model fit to the daily deaths is quite bad at some part of the time series, and I would like to find out what tools should be at my disposal to interpret and try to tackle this. For the same time series and model, solving the system of ODEs with the trapezoidal rule and a slightly longer MCMC run did not lead to divergent sampler iterations. Additionally, the model fit was good, confirmed both visually and with the Mean Absolute Error.

Given this, I was wondering whether attention should now be focused on any of the below:

(i) experimenting with different values of the adjoint solver control parameters;
(ii) decreasing adapt_delta, which currently is set to 0.99 for this output, while using the adjoint method;
(iii) try another solver like the Cash-Karp algorithm or the Adams-Moulton algorithm for non-stiff systems. At this stage, though, I do not know if these are expected to scale with the number of parameters, data, ODE states. Previous forum threads do not seem to discuss this aspect.

Second, how can I interpret the outcomes of the ā€œdiagnoseā€ method in the files of that repo? Cmdstan documentation is available.

Thanks.