CoDatMo Liverpool & UNINOVE models (slow ODE implementation and Trapezoidal Solver)

Funko_Unko · May 16, 2021, 10:16am

I think there are (at least) two bugs in your code, but I might have misunderstood something:

daily_deaths underestimates new_deaths by a factor of 7 due to the R script taking a mean where the name of the variables suggests a sum would be correct.
odeint_rk4 returns a vector of size num_steps+1, ie the initial conditions followed by the states at the time points. state_estimate should thus be of length T+1, but it is of length T. The latter is appropriate for the stan ode interface, but not for odeint_rk4.

storopoli · May 16, 2021, 11:28am

For the mean vs sum of weekly deaths, you are absolutely right. Due to the nature (horrific nature) of the Brazil’s weekly deaths I thought that I was getting cumulative sums but it’s just how the weekly deaths are horrible in Brazil. Thank you for pointing it outt.

Regarding the odeint_rk4 you are right, but take a look in the transformed parameters block:

int num_steps = size(times) - 1;

I am already doing a -1 there so the array of vectors are of length T -1 +1 which is length T. Does this clear things out?

Funko_Unko · May 16, 2021, 11:38am

Yes, I have seen this. This still leaves state_estimate[1]==initial_state, which is not what you want, even though this probably has very little effect. You concat eg

  S = append_row(initial_state[1], to_vector(state_estimate[, 1]));

such that

S = [S(0),S(0), S(1),...,S(T-2),S(T-1)]

while you actually want

S = [S(0),S(1),...,S(T-1),S(T)]

storopoli · May 16, 2021, 11:42am

I understand now. How would I correct this?

int num_steps = size(times);

Is this appropriate?

Funko_Unko · May 16, 2021, 11:44am

Yes I think so. then make state_estimate of length T+1 and the extracted states are just the respective rows in state_estimate, e.g.

  S = to_vector(state_estimate[, 1]);

storopoli · May 16, 2021, 11:53am

That didn’t work:

Chain 4 Unrecoverable error evaluating the log probability at the initial value. 
Chain 4 Exception: subtract: m1 ((435, 1)436, 1) must match in size (in '/var/folders/5_/9mb9ggq50bl0cypgy7sd4d8r0000gp/T/RtmpK3rJQA/model-9ade69dfaaee.stan', line 211, column 2 to column 57)

for all chains.

EDIT: Maybe in line 211-212

daily_infections = S[:T+1] - S[2:] + machine_precision();
daily_deaths = D[2:] - D[:T+1]     + machine_precision();

and line 216

effective_reproduction_number= (daily_infections ./ I[:T+1])*dI;

Sampling now lets see how the model will behave with Brazil’s data.

Funko_Unko · May 16, 2021, 12:05pm

I have only changed the following and it appears to work for me:

  int num_steps = T;
  [...]
  vector[n_disease_states] state_estimate[T+1];
  [...]


  S = to_vector(state_estimate[, 1]);
  E1 = to_vector(state_estimate[, 2]);
  E2 = to_vector(state_estimate[, 3]);
  I1 = to_vector(state_estimate[, 4]);
  I2 = to_vector(state_estimate[, 5]);
  T1 = to_vector(state_estimate[, 6]);
  T2 = to_vector(state_estimate[, 7]);
  D = to_vector(state_estimate[, 8]);

storopoli · May 16, 2021, 12:10pm

Yeah it’s a better fix.

What about?

vector[T+1] I = I1 + I2;
effective_reproduction_number= (daily_infections ./ I[:T])*dI;

Funko_Unko · May 16, 2021, 12:12pm

Hm, for me it compiles and doesn’t throw a runtime error as above. But this part did not have to be changed?

storopoli · May 16, 2021, 12:18pm

Well in the original one we had

int num_steps = T-1;
[...]
vector[n_disease_states] state_estimate[T];
[...]

So S, E1 etc… are all length T+1 in the original and new modifications right? So I think that:

vector[T+1] I = I1 + I2;
effective_reproduction_number= (daily_infections ./ I[:T])*dI;

should work? or do i have to put a +1 inside the (daily_infections ./ I[:T+1])*dI;?

Funko_Unko · May 16, 2021, 12:19pm

Yes, everything that depends on S etc should work as before, including the original computation of effective_reproduction_number. Edit: So you don’t have to add a +1.

Funko_Unko · May 16, 2021, 4:48pm

So, I’ve also got some questions concerning the model, @storopoli and @s.maskell:

Can you tell me more about this? I guess if you don’t provide initializations close to the “true” value you get stuck chains? The priors derive from previous work?

Also:

Why do you allow discontinuities in beta? Why no piecewise-linear+globally-continuous or just piecewise-constant ansatz?
Wouldn’t it make sense to include some term that “penalizes” differences between adjacent betas?

Thanks!

s.maskell · May 17, 2021, 8:13am

We started off with custom_inits but are now revising the priors to get round this. @remoore is also working to have a clearer rationale for our choice of priors.

We started off with piecewise constant betas (building on work by PHE/Cambridge) and found that it was better to use piecewise-linear. We do currently allow for discontinuities (because they happen eg when interventions occur), but plan to assess approaches with a prior on the continuity that allows for most adjacent betas to be continuous.

That clearer?

Funko_Unko · May 17, 2021, 8:40am

Thanks.

I’m asking because to me it looks like it’s not primarily the solution of the ODE that frustrates the sampler, but the geometry induced by (the combination of?) choosing piecewise linear (but not globally continuous) betas and not penalizing discontinuities/differences.

More than 99% of the iterations in the provided rk4 CSV files hit the maximum treepdeth, resulting in 1023 “iterations per iteration”.

After

refactoring some inefficient computation,
(using a different warmup procedure),
switching to piecewise linear, globally continuous beta and
penalizing differences in adjacent betas via beta_left[2:] ~ normal(beta_left[:n_beta_pieces-1], beta_regularization=0.1);

the number of leapfrog steps per iteration (edit: during sampling) remain around 32/64, letting me warmup+sample using the custom rk4 solver in roughly 8 minutes.

Funko_Unko · May 17, 2021, 9:38am

Compare the above figure with the corresponding figure generated from the provided CSV files.

Edit: corrected plot

Apart from the spurious understimation of deaths, there are four main differences (UNINOVE model vs my model):

n_leapfrog__: almost always at 1023 vs 32/64
stepsize__: betweem 1.6e-3 and 2e-3 vs 1e-1
Much larger uncertainties for daily_infections than for daily_deaths vs slightly larger uncertainties for daily_infections than for daily_deaths

The reason for the last difference appears to be fourth difference, edit: the correct covariance matrix for the UNINOVE model has a thick band in the lower left block, indicating a strong negative correlation between adjacent beta_left and beta_rights.

Lack of penalization of differences between adjacent betas allows spurious anticorrelation between adjacent betas

I think with the only measure being the recorded deaths and due to the smoothing effect the time delay has, the following scenarios are roughly equivalent for the unpenalized model:

A period of moderate beta in two adjacent weeks,
A period of low beta in the first week followed by a period of high beta in the second week,
A period of high beta in the first week followed by a period of low beta in the second week,
all followed by a moderate number of deaths a few weeks later.

But really, only the first scenario makes sense.

caesoma · May 17, 2021, 3:53pm

Generally, I tend to think a bigger issue with ODEs is stochasticity propagating over time in the process generating the data (and diverging from the deterministic prediction). It is well known that many dynamic systems are very sensitive to process noise.

Of course discrete-time approximations don’t solve the issue, but they are amenable to filtering methods which are arguably a better form or error correction than internal error correction from ODE solvers, plus (and this is speculation) the cost may trade-off when used within an HMC framework.

In terms of performance (for estimates, not necessarily time) it would be a big deal if that was the standard, since often the actual estimates are artifacts from the deterministic lack of flexibility.

Funko_Unko · May 17, 2021, 5:36pm

I think that’s a great point and something that inadvertently fits well together with the UNINOVE model, although there it looks more like an unintentional sideffect than a feature. Observe the two smoothest (top) and two most jagged (bottom) trajectories of beta (proportional to the current infection rate) for some regularized model (left) and the original UNINOVE model (right). (Note the quite different y-scales).

One could reintroduce the original stochasticity into the regularized model by letting

beta ~ normal(overall trend,stochastic noise)

where the overall trend could be even more strongly regularized. This might help with fitting / modelling.

storopoli · May 17, 2021, 6:38pm

@Funko_Unko this is great. I think that a penalized beta would be a nice fit and can speed up the model as we drop the piece-wise linear betas pieces. (8 minutes is awesome RK4 would run for 3h in my high-end computer)

Does the beta ~ normal(overall trend,stochastic noise) also underestimate the deaths?

Funko_Unko · May 17, 2021, 8:21pm

Hmm, so i just checked and the speedup is partially due to the different warmup procedure. For data consisting of only the first 224 days, “my” warmup takes 2m23s+27s=3m, while the regular warmup takes 15m14s+6m11s=21m. On top, the regular warmup scales much worse, or rather “my” warmup retains its performance for longer.

Edit: Further tests reveal that a lot can be gained for the regular warmup by (also) using a dense metric, yielding times of 5m56s+42s=6m38s for the above data.

No, this is/was wholly an artifact of the bug in the R script. For the figure above I pulled the fit from github.

storopoli · May 18, 2021, 10:08am

@wds15 the Adjoint ODE solver finished sampling in ~43h:

All 4 chains finished successfully.
Mean chain execution time: 142145.5 seconds.
Total execution time: 153755.4 seconds.

Warning: 1 of 4000 (0.0%) transitions ended with a divergence.
This may indicate insufficient exploration of the posterior distribution.
Possible remedies include: 
  * Increasing adapt_delta closer to 1 (default is 0.8) 
  * Reparameterizing the model (e.g. using a non-centered parameterization)
  * Using informative or weakly informative prior distributions 

3999 of 4000 (100.0%) transitions hit the maximum treedepth limit of 10 or 2^10-1 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
Increasing the max_treedepth limit can avoid this at the expense of more computation.
If increasing max_treedepth does not remove warnings, try to reparameterize the model.

The CSVs results can be found in: UNINOVE_Sao_Paulo/SEIR-model/results/deaths_adjoint at main · codatmo/UNINOVE_Sao_Paulo · GitHub

EDIT: Lots of rhat way above 1.01…

Topic		Replies	Views
Adjoint ODE behaviour CmdStan fitting-issues , ode , covid-19	56	3504	November 10, 2021
Adjoint ODE Prototype - RFC - Please Test General ode	41	2756	April 7, 2021
CmdStan 2.27.0 release candidate General	36	2163	June 10, 2021
ODE solver errors & model parameterisations Modeling rstan , fitting-issues , ode , covid-19 , reparametrization	3	880	March 14, 2022
Fitting a Simple SIR model with parameters that depend on time Modeling rstan , epidemiology , covid-19	12	1307	July 10, 2020

CoDatMo Liverpool & UNINOVE models (slow ODE implementation and Trapezoidal Solver)

Related topics