I think there are (at least) two bugs in your code, but I might have misunderstood something:
daily_deaths underestimates new_deaths by a factor of 7 due to the R script taking a mean where the name of the variables suggests a sum would be correct.
odeint_rk4 returns a vector of size num_steps+1, ie the initial conditions followed by the states at the time points. state_estimate should thus be of length T+1, but it is of length T. The latter is appropriate for the stan ode interface, but not for odeint_rk4.
For the mean vs sum of weekly deaths, you are absolutely right. Due to the nature (horrific nature) of the Brazil’s weekly deaths I thought that I was getting cumulative sums but it’s just how the weekly deaths are horrible in Brazil. Thank you for pointing it outt.
Regarding the odeint_rk4 you are right, but take a look in the transformed parameters block:
int num_steps = size(times) - 1;
I am already doing a -1 there so the array of vectors are of length T -1 +1 which is length T. Does this clear things out?
Yes, I have seen this. This still leaves state_estimate[1]==initial_state, which is not what you want, even though this probably has very little effect. You concat eg
S = append_row(initial_state[1], to_vector(state_estimate[, 1]));
Chain 4 Unrecoverable error evaluating the log probability at the initial value.
Chain 4 Exception: subtract: m1 ((435, 1)436, 1) must match in size (in '/var/folders/5_/9mb9ggq50bl0cypgy7sd4d8r0000gp/T/RtmpK3rJQA/model-9ade69dfaaee.stan', line 211, column 2 to column 57)
Yes, everything that depends on S etc should work as before, including the original computation of effective_reproduction_number. Edit: So you don’t have to add a +1.
So, I’ve also got some questions concerning the model, @storopoli and @s.maskell:
Can you tell me more about this? I guess if you don’t provide initializations close to the “true” value you get stuck chains? The priors derive from previous work?
Also:
Why do you allow discontinuities in beta? Why no piecewise-linear+globally-continuous or just piecewise-constant ansatz?
Wouldn’t it make sense to include some term that “penalizes” differences between adjacent betas?
We started off with custom_inits but are now revising the priors to get round this. @remoore is also working to have a clearer rationale for our choice of priors.
We started off with piecewise constant betas (building on work by PHE/Cambridge) and found that it was better to use piecewise-linear. We do currently allow for discontinuities (because they happen eg when interventions occur), but plan to assess approaches with a prior on the continuity that allows for most adjacent betas to be continuous.
I’m asking because to me it looks like it’s not primarily the solution of the ODE that frustrates the sampler, but the geometry induced by (the combination of?) choosing piecewise linear (but not globally continuous) betas and not penalizing discontinuities/differences.
More than 99% of the iterations in the provided rk4 CSV files hit the maximum treepdeth, resulting in 1023 “iterations per iteration”.
After
refactoring some inefficient computation,
(using a different warmup procedure),
switching to piecewise linear, globally continuous beta and
penalizing differences in adjacent betas via beta_left[2:] ~ normal(beta_left[:n_beta_pieces-1], beta_regularization=0.1);
the number of leapfrog steps per iteration (edit: during sampling) remain around 32/64, letting me warmup+sample using the custom rk4 solver in roughly 8 minutes.
Apart from the spurious understimation of deaths, there are four main differences (UNINOVE model vs my model):
n_leapfrog__: almost always at 1023 vs 32/64
stepsize__: betweem 1.6e-3 and 2e-3 vs 1e-1
Much larger uncertainties for daily_infections than for daily_deaths vs slightly larger uncertainties for daily_infections than for daily_deaths
The reason for the last difference appears to be fourth difference, edit: the correct covariance matrix for the UNINOVE model has a thick band in the lower left block, indicating a strong negative correlation between adjacent beta_left and beta_rights.
Lack of penalization of differences between adjacent betas allows spurious anticorrelation between adjacent betas
I think with the only measure being the recorded deaths and due to the smoothing effect the time delay has, the following scenarios are roughly equivalent for the unpenalized model:
A period of moderate beta in two adjacent weeks,
A period of low beta in the first week followed by a period of high beta in the second week,
A period of high beta in the first week followed by a period of low beta in the second week,
all followed by a moderate number of deaths a few weeks later.
Generally, I tend to think a bigger issue with ODEs is stochasticity propagating over time in the process generating the data (and diverging from the deterministic prediction). It is well known that many dynamic systems are very sensitive to process noise.
Of course discrete-time approximations don’t solve the issue, but they are amenable to filtering methods which are arguably a better form or error correction than internal error correction from ODE solvers, plus (and this is speculation) the cost may trade-off when used within an HMC framework.
In terms of performance (for estimates, not necessarily time) it would be a big deal if that was the standard, since often the actual estimates are artifacts from the deterministic lack of flexibility.
I think that’s a great point and something that inadvertently fits well together with the UNINOVE model, although there it looks more like an unintentional sideffect than a feature. Observe the two smoothest (top) and two most jagged (bottom) trajectories of beta (proportional to the current infection rate) for some regularized model (left) and the original UNINOVE model (right). (Note the quite different y-scales).
@Funko_Unko this is great. I think that a penalized beta would be a nice fit and can speed up the model as we drop the piece-wise linear betas pieces. (8 minutes is awesome RK4 would run for 3h in my high-end computer)
Does the beta ~ normal(overall trend,stochastic noise) also underestimate the deaths?
Hmm, so i just checked and the speedup is partially due to the different warmup procedure. For data consisting of only the first 224 days, “my” warmup takes 2m23s+27s=3m, while the regular warmup takes 15m14s+6m11s=21m. On top, the regular warmup scales much worse, or rather “my” warmup retains its performance for longer.
Edit: Further tests reveal that a lot can be gained for the regular warmup by (also) using a dense metric, yielding times of 5m56s+42s=6m38s for the above data.
No, this is/was wholly an artifact of the bug in the R script. For the figure above I pulled the fit from github.
@wds15 the Adjoint ODE solver finished sampling in ~43h:
All 4 chains finished successfully.
Mean chain execution time: 142145.5 seconds.
Total execution time: 153755.4 seconds.
Warning: 1 of 4000 (0.0%) transitions ended with a divergence.
This may indicate insufficient exploration of the posterior distribution.
Possible remedies include:
* Increasing adapt_delta closer to 1 (default is 0.8)
* Reparameterizing the model (e.g. using a non-centered parameterization)
* Using informative or weakly informative prior distributions
3999 of 4000 (100.0%) transitions hit the maximum treedepth limit of 10 or 2^10-1 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
Increasing the max_treedepth limit can avoid this at the expense of more computation.
If increasing max_treedepth does not remove warnings, try to reparameterize the model.