@storopoli Hm, this is how a very unpolished fan plot that I have lying around looks like for me for varying beta_regularization
:
Let me rename a few things in the above model and introduce some comments to clarify things. I’ll edit the above post.
- Re 1: Looks good to me.
daily_infections
anddaily_deaths
should exactly mirror the quantities from the previous models. - Re 2: This also looks good to me, but you/the data will have to be the ultimate judge of this.
- Re 3: I don’t think this makes a difference, but I do not know the Stan internals.
- Re 4: The model expects the daily new deaths. I’ll rename the variable to make this in line with the provided data, which has a column
new_deaths
.
@s.maskell Thanks, I was/am pretty excited about this myself. BTW, from what I have seen from the Liverpool model, this parametrization should also be applicable there.
Some further thoughts for anyone that’s listening:
There are two main differences to the previous models,
- the change in how the ODE is (approximately) solved and
- the changed parametrization
and I’m not sure what has the greater impact.
An (nonexistent) efficient implementation of the custom rk4 solver (with one step per day) boils down to the addition of 4 size-8 vectors which are quickly computed. Contrast this with one 7x7 dense matrix-vector multiplication for the above model. Speedwise, I would not be surprised if the rk4 solver could actually be quicker/not much slower than the matrix_exp
method for a single iteration. I did very shallowly investigate the impact that the accuracy of the solver has on the inference (hence the sub_num_steps
variable) and found it to be essentially nonexistent, probably due to the sparse and delayed data.
However the previous models were generally much slower than this one, and that with far fewer variables. Before, we had 63 (or 2x63) parameters for beta and now we have 435+ parameters. The best model so far finished in roughly 8 minutes, but convergence was actually slightly finicky and depended on the choice of ansatz, regularization, prior and initialization. I did also try the previous models with one beta parameter per day, but this took even longer than 8 minutes and I’m quite impatient. On top, if you look at N_eff
of the previous models (both my model and UNINOVE), these were generally around or less than 1k, with the worst N_eff (usually lp__
) being only around 300. For this model, even the worst N_eff is usually above 2k, Rhat(lp__) is always in the order of 1.001 and E-BFMI is around 1.
So this all points towards the main benefit actually arising from the reparametrization, but I’m no expert. Can someone more experienced than I weigh in?
One further question:
Rt.live appears to have used a simple convolution model, and still appears to have been much slower:
Sampling 4 chains for 700 tune and 200 draw iterations (2_800 + 800 draws total) took 4340 seconds.
INFO:pymc3:Sampling 4 chains for 700 tune and 200 draw iterations (2_800 + 800 draws total) took 4340 seconds.
However, they used pymc3 and an extended model, so maybe this is the reason? I haven’t actually looked at (or rather understood) their model.
Edit with a small comment:
Actually, the matrix_exp
method should stay much quicker than even the most efficient rk4 method, because the small 7x7 matrix can probably stay in cache and the intermediate rk-steps depend on each other, probably leading to worsened pipelining. But who knows in this age of compilers/processors.
Further edit: @tvladeck is actually here, maybe he could weigh in? Relevant backlink to Translating random walk to gaussian process (and pymc3 model to stan)