CoDatMo Liverpool & UNINOVE models (slow ODE implementation and Trapezoidal Solver)

This is interesting! Thanks! I will try the model with several beta_regularization values and also with:

  • beta_linear_discontinuous
  • beta_linear_continuous
  • beta_constant_discontinuous

The num_sub_steps inside the rk4 could be interpreted as weeks in time t to use the betas to integrate the ODE system right?

Yes, I agree that it needs a major refactoring.

Hm, no, it specifies how many rk4 timesteps are performed per day. hh is 1 day, and the h used by the intermediate steps is hh/num_sub_steps.

Hm, okay everyone, forget every other model that has ever been mentioned in this thread. With the typical overreach I present to you the only SEIR model you’ll ever need. (I think) it allows any type of priors, regularization, ansatz or stochasticity for the infection rate, while converging perfectly and fast while probably being much more exact than any of the other models. Tagging the relevant people: @storopoli, @caesoma and @s.maskell and for good measure @bbbales2.

In less than two minutes we get > 4k N_eff for any of 485+ instead of 63 parameters.

So, while I don’t believe its applicable to many problems, here it fits perfectly. It’s essentially what @wds15 said:

The key idea is just that anything downstream of E1 is linear. Instead of parametrizing in terms of some inital condition and betas, we just parametrize in terms of new infections per day. Then, once per leapfrog iteration we can compute a fixed transition matrix using matrix_exp and just iteratively (per day) compute a single matrix vector product.

For sake of simplicity I don’t pool the daily deaths in this model, but this and anything else is perfectly possible and should not impact convergence at all ™.

Enjoy:

data {
  int no_days;
  int population;
  //observed new deaths per day
  int new_deaths[no_days];
  real beta_regularization;
  real likelihood;
}
parameters {
  /*
    The elements of a simplex sum to one.
    The first no_days entries represent daily new infections as a fraction
    of the total population while the last entry represents the proportion that
    remains susceptible at the end.
  */
  simplex[no_days+1] unit_dS;
  real<lower=0> dL;
  real<lower=0> dI;
  real<lower=0> dT;
  real<lower=0, upper=1> omega;
  real<lower=0> reciprocal_phi_deaths;
}
transformed parameters {
  vector[no_days] daily_infections = population * unit_dS[:no_days];
  vector[no_days] daily_deaths;
  vector[no_days] beta;
  if(likelihood){
    vector[7] state = [
        0, 0,
        0, 0,
        0, 0,
        0
    ]';
    matrix[7, 7] transition_matrix = matrix_exp([
    //[E1   ,E2   ,I1   ,I2         ,T1   ,T2   ,D]
      [-2/dL,0    ,0    ,0          ,0    ,0    ,0],//E1
      [+2/dL,-2/dL,0    ,0          ,0    ,0    ,0],//E2
      [0    ,+2/dL,-2/dI,0          ,0    ,0    ,0],//I1
      [0    ,0    ,+2/dI,-2/dI      ,0    ,0    ,0],//I2
      [0    ,0    ,0    ,+2/dI*omega,-2/dT,0    ,0],//T1
      [0    ,0    ,0    ,0          ,+2/dT,-2/dT,0],//T2
      [0    ,0    ,0    ,0          ,0    ,+2/dT,0]//D
    ]);
    real S = population;
    real last_D;
    for(i in 1:no_days){
      last_D = state[7];
      S -= daily_infections[i];
      state[1] += daily_infections[i];
      state = transition_matrix * state;
      daily_deaths[i] = state[7] - last_D;
      beta[i] = daily_infections[i] * population / (S*(state[3]+state[4]));
    }
  }
}
model {
  //One possible regularization
  if(beta_regularization){
    unit_dS[2:no_days] ~ lognormal(log(unit_dS[:no_days-1]), beta_regularization);
  }
  //This imposes a very wide prior on the proportion of still susceptible people!
  unit_dS[no_days+1] ~ uniform(0,1);
  dL ~ normal(4., .2);
  dI ~ normal(3.06, .21);
  dT ~ normal(16, .71);
  omega ~ beta(100, 9803);
  reciprocal_phi_deaths ~ exponential(5);
  if(likelihood){
    new_deaths ~ neg_binomial_2(daily_deaths, 1/reciprocal_phi_deaths);
  }
}

Small caveat: I don’t know how exact matrix_exp is, but probably quite exact.

Edit: also, unit_S is very inappropriately named, I did not know how simplex works.

Edit2: updated the model.

Edit3: fixed an off-by one error in beta[i] = daily_infections[i] * population / (S*(state[3]+state[4])); ← this is correct.

8 Likes

@Funko_Unko this is great! You took the whole ODE solver out!

  1. We are missing the generated quantities block:
    generated quantities {
       vector[no_days-1] growth_rate = (log(daily_infections[2:]) - log(daily_infections[:no_days-1]))*100;
       int pred_deaths[no_days] = neg_binomial_2_rng(daily_deaths, 1/reciprocal_phi_deaths);
    }
    
  2. We are also missing effective_reproduction_number:
    transformed parameters {
      vector[no_days] daily_infections = population * unit_S[:no_days];
      vector[no_days] daily_deaths;
      vector[no_days] effective_reproduction_number;
    { vector[7] state = [
         0, 0,
         0, 0,
         0, 0,
         0
       ]';
       matrix[7, 7] transition_matrix = matrix_exp([
       //[E1   ,E2   ,I1   ,I2         ,T1   ,T2   ,D]
         [-2/dL,0    ,0    ,0          ,0    ,0    ,0],//E1
         [+2/dL,-2/dL,0    ,0          ,0    ,0    ,0],//E2
         [0    ,+2/dL,-2/dI,0          ,0    ,0    ,0],//I1
         [0    ,0    ,+2/dI,-2/dI      ,0    ,0    ,0],//I2
         [0    ,0    ,0    ,+2/dI*omega,-2/dT,0    ,0],//T1
         [0    ,0    ,0    ,0          ,+2/dT,-2/dT,0],//T2
         [0    ,0    ,0    ,0          ,0    ,+2/dT,0]//D
       ]);
      real last_D;
      for(i in 1:no_days){
        last_D = state[7];
        state[1] += daily_infections[i];
        state = transition_matrix * state;
        daily_deaths[i] = state[7] - last_D;
        effective_reproduction_number[i] = daily_infections[i] / (state[3] + state[4]) * dI;
    }}
    

}

  1. Can we use target += neg_binomial_2_lpmf? Does it speed things up?

    target += neg_binomial_2_lpmf(deaths | daily_deaths, 1 / reciprocal_phi_deaths);
    
  2. deaths[no_deaths] is a cumulative sum (cumsum). Just to clear for the people watching this.

I am running now. @Funko_Unko does 1. and 2. are okay?

Thanks!

EDIT: I am calling this model deaths_matrix_exp.stan :)

EDIT2: Damn it is fast!

All 4 chains finished successfully.
Mean chain execution time: 117.2 seconds.
Total execution time: 127.5 seconds.

EDIT3: I am doing something wrong with the deaths (???). The gray thing is the 90% range (q5 and q90):

3 Likes

This is absolutely awesome. Wow. We’ll certainly take a look. Thank you for the insight.
Simon

PS @remoore, @alphillips: please take a look with some urgency. This could well be transformational.

1 Like

@storopoli Hm, this is how a very unpolished fan plot that I have lying around looks like for me for varying beta_regularization:

Let me rename a few things in the above model and introduce some comments to clarify things. I’ll edit the above post.

  • Re 1: Looks good to me. daily_infections and daily_deaths should exactly mirror the quantities from the previous models.
  • Re 2: This also looks good to me, but you/the data will have to be the ultimate judge of this.
  • Re 3: I don’t think this makes a difference, but I do not know the Stan internals.
  • Re 4: The model expects the daily new deaths. I’ll rename the variable to make this in line with the provided data, which has a column new_deaths.

@s.maskell Thanks, I was/am pretty excited about this myself. BTW, from what I have seen from the Liverpool model, this parametrization should also be applicable there.

Some further thoughts for anyone that’s listening:

There are two main differences to the previous models,

  • the change in how the ODE is (approximately) solved and
  • the changed parametrization

and I’m not sure what has the greater impact.

An (nonexistent) efficient implementation of the custom rk4 solver (with one step per day) boils down to the addition of 4 size-8 vectors which are quickly computed. Contrast this with one 7x7 dense matrix-vector multiplication for the above model. Speedwise, I would not be surprised if the rk4 solver could actually be quicker/not much slower than the matrix_exp method for a single iteration. I did very shallowly investigate the impact that the accuracy of the solver has on the inference (hence the sub_num_steps variable) and found it to be essentially nonexistent, probably due to the sparse and delayed data.

However the previous models were generally much slower than this one, and that with far fewer variables. Before, we had 63 (or 2x63) parameters for beta and now we have 435+ parameters. The best model so far finished in roughly 8 minutes, but convergence was actually slightly finicky and depended on the choice of ansatz, regularization, prior and initialization. I did also try the previous models with one beta parameter per day, but this took even longer than 8 minutes and I’m quite impatient. On top, if you look at N_eff of the previous models (both my model and UNINOVE), these were generally around or less than 1k, with the worst N_eff (usually lp__) being only around 300. For this model, even the worst N_eff is usually above 2k, Rhat(lp__) is always in the order of 1.001 and E-BFMI is around 1.

So this all points towards the main benefit actually arising from the reparametrization, but I’m no expert. Can someone more experienced than I weigh in?

One further question:

Rt.live appears to have used a simple convolution model, and still appears to have been much slower:

Sampling 4 chains for 700 tune and 200 draw iterations (2_800 + 800 draws total) took 4340 seconds.
INFO:pymc3:Sampling 4 chains for 700 tune and 200 draw iterations (2_800 + 800 draws total) took 4340 seconds.

However, they used pymc3 and an extended model, so maybe this is the reason? I haven’t actually looked at (or rather understood) their model.

Edit with a small comment:
Actually, the matrix_exp method should stay much quicker than even the most efficient rk4 method, because the small 7x7 matrix can probably stay in cache and the intermediate rk-steps depend on each other, probably leading to worsened pipelining. But who knows in this age of compilers/processors.

Further edit: @tvladeck is actually here, maybe he could weigh in? Relevant backlink to Translating random walk to gaussian process (and pymc3 model to stan)

I’ve ran the model using couple of beta_regularization values. The predictions and 90% quantiles seems do behave better now. Here are some benchmarks using UNINOVE data (435 days) in a 2018 Mac Mini i5-8500B (6-core single thread per core):

  • beta_regularization = 0

    All 4 chains finished successfully.
    Mean chain execution time: 55.6 seconds.
    Total execution time: 56.8 seconds.
    

  • beta_regularization = 0.01

    All 4 chains finished successfully.
    Mean chain execution time: 820.2 seconds.
    Total execution time: 837.2 seconds.
    1621 of 4000 (41.0%) transitions hit the maximum treedepth limit of 10 or 2^10-1 leapfrog steps.
    Trajectories that are prematurely terminated due to this limit will result in slow exploration.
    Increasing the max_treedepth limit can avoid this at the expense of more computation.
    If increasing max_treedepth does not remove warnings, try to reparameterize the model.
    

  • beta_regularization = 0.10

    All 4 chains finished successfully.
    Mean chain execution time: 163.7 seconds.
    Total execution time: 165.6 seconds.
    

  • beta_regularization = 0.20

    All 4 chains finished successfully.
    Mean chain execution time: 125.3 seconds..
    Total execution time: 132.5 seconds.
    

  • beta_regularization = 0.30

    All 4 chains finished successfully.
    Mean chain execution time: 107.7 seconds.
    Total execution time: 116.9 seconds.
    

  • beta_regularization = 0.30

    All 4 chains finished successfully.
    Mean chain execution time: 107.7 seconds.
    Total execution time: 116.9 seconds.
    
  • beta_regularization = 0.5

    All 4 chains finished successfully.
    Mean chain execution time: 104.7 seconds.
    Total execution time: 110.0 seconds.
    

Also here is the model in .stan file with the generated quantities block and beta regularizations.
deaths_matrix_exp.stan (2.6 KB)

1 Like

could you share the code that you’ve used to generate this image?

Ah, right, it looks like too small a beta_regularization leads to beta being forced to be rather constant and the model compensating by cranking up (or down?) reciprocal_phi_deaths. Observe that in my figure in the left column (beta_regularization=0.01) and the lowest fanplot (daily_deaths) the lines to not agree with each other.

Sure.

Edit:

Also, due to the periodicity of the reporting of deaths which has been masked by the pooling, reciprocal_phi_deaths probably gets under/overestimated anyways. This could probably be fixed either by pooling again or by modeling this periodicity explicitly.

Yeah we might work instead of daily_deaths with weekly_deaths?

Hi all,

Firstly, I want to apologise for being late for the conversation.

I’ve spent quite a lot of time working on this model, and it’s great to get your collective thoughts on how it could be made more computationally efficient. There are an awful lot of excellent ideas in this thread, most of which I’m still trying to digest. I intend to mull these over and contact you about them at a later date.

My initial comments relate to @Funko_Unko’s

only SEIR model you’ll ever need

I think the idea is a brilliant way of drastically reducing the time it takes to fit the model. Still, I’m slightly concerned that this performance improvement comes at the expense of discarding potentially important epidemiological understanding that the original model structure captured. This reparameterisation breaks the link between the size of the infectious population and the rate of new infections, making the distinctions between the meanings of the exposed, infectious, and terminally ill compartments less clear. My concern is that by going further down this route, we’ll end up with a black-box model that assumes the time series of observations to be lagged version of the time series of daily infections without capturing assumptions about the epidemiological processes that drive the rate of infection.

I’m more than happy to change my view and would welcome others thoughts on this.

Could you initialise the parameters in common between the two models using values from the simplified / fast one?

@hhau, initialising the beta parameterisation of the model with the outputs of the infection rate parameterisation of the model is undoubtedly a good idea, but I’m not sure how much of a performance improvement you would get

Yes, you can. However, the initialization is not the only problem the original models have. To really save any time you’d also have to translate the metrics somehow, which should be possible in principle, but maybe somewhat tedious in practice. On top, even if you had a perfectly initialized metric, you might still get divergences or other issues.

Either this, this is reasonably straightforward. You could also introduce an additional vector parameter

real<lower=0, upper=1> reporting_probability[7];

which would quantify how probable the reporting at each weekday is. One preliminary model would yield this:

Edit: whoops, forgot the figure:

  • Left: condition on weekly deaths only. No problems here.
  • Right: Use some reporting probability. Quite a few divergences here. So it’s not 100% straightforward.

Yes, this is most certainly a possibility/danger. Really, the simplex model is just a glorified convolution model. However, you can still impose the same priors on an (approximated) beta, so I think you should be able to inject all the domain knowledge you have/need.

Not much. Very little work is put into finding the typical set (I think 50 warmup iterations), while almost all of the work has to be put into estimating the metric. The only efficient reinitialization requires that you somehow transfer the metrics from one fit to the other. Which is possible and will, if the geometry permits it, provide insane speedups.

1 Like

I think you should be able to inject all the domain knowledge you have/need

It’s not immediately apparent to me how to incorporate the domain knowledge that a larger infectious population leads to a larger infection rate into the simplex model. Can you expand on how you would go about injecting this information?

So I think that you can in principle just add something like the following to your model block:

daily_infections ~ lognormal(log(factor*SI/population), sigma);

where factor and sigma would have to be determined somehow and SI=S*(I1+I2) as before. But I’m not quite sure.

Just caught up with and absorbed all of the above - thanks for all the brilliant suggestions.

This sounds reasonable to me, (daily_infections is just a constant multiple of unit_dS so no Jacobian adjustment needed) but I think it conflicts with the regularisation in:

unit_dS[2:no_days] ~ lognormal(log(unit_dS[:no_days-1]), beta_regularization);

There’s probably a way of doing both at once but I’d have to think carefully about how to achieve that.

Hm, a quick PSA: I haven’t thought very much about the initial conditions, and they are probably not appropriate.

Also I don’t have any predictive improvement using the Linear Chain Trick (S E1 E2 I1 I2 T1 T2 D) from a simple (S E I T D). At least in Brazil’s data.

Take a look at:

I am getting the same MAE for weekly predicted deaths vs real deaths:

# A tibble: 1 x 2
  MAE_median MAE_mean
       <dbl>    <dbl>
1       253.     255.

See the pictures below: