Convergence / latest version of Stan issues with our covid-19 model?

mans_magnusson · April 18, 2020, 6:29am

@flaxter, I would be happy to help to get the improved code to the master branch asap. @wds15, ideally I guess we would like to combine our two stan files and do a PR?

If I understand reduce_sum(), it is not using any parallelism if you do not explicitly tell that to the compiler? So the code would be good to go for single-core use as well?
I had some problems finding a good way to debug my code to asses that it gives the exact same result as the previous implementation. After debugging and discussions with @paul.buerkner I chose to just debug it with print statements. Do you have any suggestions on a smarter approach to check that we get the exact same results with both model files?

With kind regards
Måns

flaxter · April 18, 2020, 7:14am

Thanks! As I understand it, the best way to debug (i.e. to ensure that the models are exactly the same) is to do runs with fixed seeds and check that lp__ is identical.

mans_magnusson · April 18, 2020, 8:03am

Hi!

I have now updated the stan model with the speed improvements @wds15 proposed, but without reduce_sum() for now. I have checked that it works with Rstan 2.21.1.

The code can be found here:

github.com

MansMeg/covid19model/blob/general_covariate_speed/stan-models/base_general_speed.stan

data {
  int <lower=1> M; // number of countries
  int <lower=1> P; // number of covariates
  int <lower=1> N0; // number of days for which to impute infections
  int<lower=1> N[M]; // days of observed data for country m. each entry must be <= N2
  int<lower=1> N2; // days of observed data + # of days to forecast
  int cases[N2,M]; // reported cases
  int deaths[N2, M]; // reported deaths -- the rows with i > N contain -1 and should be ignored
  matrix[N2, M] f; // h * s
  matrix[N2, P] X[M];
  int EpidemicStart[M];
  real pop[M];
  real SI[N2]; // fixed pre-calculated SI using emprical data from Neil
}

transformed data {
  vector[N2] SI_rev; // SI in reverse order
  vector[N2] f_rev[M]; // f in reversed order
  
  for(i in 1:N2)

This file has been truncated. show original

And the changes can be found in this commit:

The speed improvement is quite substantial:

Baseline model (base_general.stan)

Chain 1:  Elapsed Time: 274.581 seconds (Warm-up)
Chain 1:                416.137 seconds (Sampling)
Chain 1:                690.718 seconds (Total)
Chain 1:

Improved model (base_general_speed.stan)

Chain 1:  Elapsed Time: 82.9182 seconds (Warm-up)
Chain 1:                119.316 seconds (Sampling)
Chain 1:                202.234 seconds (Total)
Chain 1:

I need to debug to be sure they both give the exact same result. As you note we can run with seed and check lp__. Although, sometimes when the stan file is changed, the compiler rearrange stuff so that the seed don’t have the same effect for two different model codes. I don’t know the inner workings well enough to know exactly why, but I expect the fact that we make alpha a vector instead of an array is an example. I think it should work by checking the lp__ for a specified data and parameter values. I will see if I can get those checks through Rstan somehow.

So, I’ll try to debug the code now to check that it gives the exact same result.

mans_magnusson · April 18, 2020, 6:01pm

Hi all,

I have now done explicit checks on the lp for specific parameters to asses that all implementation gives the same lp__. When doing this I found a bug at line 80-83:

github.com

MansMeg/covid19model/blob/general_covariate_speed/stan-models/base_general_speed.stan

data {
  int <lower=1> M; // number of countries
  int <lower=1> P; // number of covariates
  int <lower=1> N0; // number of days for which to impute infections
  int<lower=1> N[M]; // days of observed data for country m. each entry must be <= N2
  int<lower=1> N2; // days of observed data + # of days to forecast
  int cases[N2,M]; // reported cases
  int deaths[N2, M]; // reported deaths -- the rows with i > N contain -1 and should be ignored
  matrix[N2, M] f; // h * s
  matrix[N2, P] X[M];
  int EpidemicStart[M];
  real pop[M];
  real SI[N2]; // fixed pre-calculated SI using emprical data from Neil
}

transformed data {
  vector[N2] SI_rev; // SI in reverse order
  vector[N2] f_rev[M]; // f in reversed order
  
  for(i in 1:N2)

This file has been truncated. show original

When including the speedup improvement there, I get a different log_prob.

I have now commented out your previous code in that section @wds15, but maybe you can see what is wrong?

The current implementation without that final improvement gives roughly a speedup of 50%, compared to 3-4x when including the part that now is commented out. Hence it would be really nice to get that final line to work.

Improved model (base_general_speed.stan - without improvement in E_deaths)

Chain 1: 
Chain 1:  Elapsed Time: 230.212 seconds (Warm-up)
Chain 1:                267.973 seconds (Sampling)
Chain 1:                498.185 seconds (Total)
Chain 1:

Finally, here is a test code to test the three different models that they return the same log_prob:

github.com

MansMeg/covid19model/blob/general_covariate_speed/debug/test_stan_models_identical_lp.R

# Install posteriordb to acces the model and data directly
# remotes::install_github("MansMeg/posteriordb", subdir = "rpackage")
library(posteriordb)
library(rstan)
po <- posterior_names()
po <- posterior("covid19imperial_v2-covid19imperial_v2")
sd1 <- stan_data(po)
sc <- stan_code(po)

# Reconstruct the data to general format
sd2 <- sd1
sd2$P <- 6
sd2$X <- array(c(sd2$covariate1, sd2$covariate2, sd2$covariate3, unname(as.matrix(sd2$covariate4)), sd2$covariate5, sd2$covariate6) , 
               dim = c( 90 , 14 , 6 ))
sd2$X <- aperm(sd2$X, c(2,1,3))
sd2$covariate1 <- 
  sd2$covariate2 <-
  sd2$covariate3 <-
  sd2$covariate4 <-
  sd2$covariate5 <-

This file has been truncated. show original

With kind regards
Måns

wds15 · April 18, 2020, 6:27pm

Thanks for catching this.

Can you try
// E_deaths[i,m] = ifr_noise[m] * dot_product(sub_col(prediction, 1, m, i-1), tail(f_rev[m], i-1));

Instead?

mans_magnusson · April 18, 2020, 6:42pm

Yes, that did it!

It now give a 3-4x speedup and give the same lp__ for test parameters, great job @wds15! @flaxter, I’ll add this to my PR right away.

/Måns

wds15 · April 20, 2020, 10:50am

Hi!

I think there are even more speedups on the table: The generated quantities block did not yet use the optimised expressions. So I did one more round of changes:

corrected model (there was an off by one error; thanks to @MansMeg for nailing that one)
modularised the model => now the generated quantities block also uses the optimised code with dot products. This makes the model again somewhat faster (but please have look as E_death[1] is now slightly different, but to me this looks like only a numerical difference, which should not matter)
all former model outputs are now part of the output (E_deaths, predictions, Rt, Rt_adj)
the model is setup to be compatible with RStan and does not use parallelisation as offered by Cmdstan 2.23 - but it can be easily switched as shown in the comments.
this time I checked that the log-lik value is exactly the same for a test draw

While I understand the desire the use RStan, I think it would be good to keep the model in this modularised form such that you can easily switch between RStan without parallelisation & CmdStan with parallelisation.

The extra speed should really give you guys a lot of room to improve the model in itself if that is possible.

Best,
Sebastian

github.com

flaxter/covid19model/blob/parallel/stan-models/base_rs4.stan

functions {

  // calculates for a given country the model outputs as column
  // vectors. These are saved as an array with the outputs of (1)
  // E_deaths, (2) prediction, (3) Rt, (4) R_adj
  vector[] country_model(real mu_local,
                         vector alpha,
                         real y_local,
                         real ifr_noise_local,
                         int N0,
                         int N2,
                         matrix X_local,
                         vector SI_rev,
                         real pop_local,
                         vector f_rev_local
                         ) {
    vector[N2] prediction = rep_vector(0.0, N2);
    vector[N2] E_deaths = rep_vector(0.0, N2);
    vector[N2] Rt = rep_vector(0.0, N2);
    vector[N2] Rt_adj = rep_vector(0.0, N2);

This file has been truncated. show original

mans_magnusson · April 20, 2020, 11:01am

This is great! Unfortunately I realized that we now have two different PRs to the original repository with more or less the same idea. I think we would like to merge them to one PR wthat both handle Rstan and CmdStan + handle design matrices as input. How do we do this in the easiest way? Should I hack on your PR or should you hack on mine? =)

rok_cesnovar · April 20, 2020, 11:09am

I opened the PR more or less to show how to use cmdstan via cmdstanr in a simple way. I am going to close mine and you can add the cmdstanr use if you want.

The differences are really a few lines. If you need help let me know.

mans_magnusson · April 20, 2020, 11:46am

Cool! I think @wds15 added som additional improvements in that PR. I can later today take a look and try to merge them together so we get the final improvements as well.

@flaxter, do you want me to add cmdstan as well for within-chain parallelism?

/Måns

wds15 · April 20, 2020, 12:30pm

I am afraid we have now three places here for the code (@rok_cesnovar, yours, mine ). It would be great if you, @mans_magnusson, could coordinate what now should finally be merged.

The model I wrote now tries to make it as simple as possible for COVID-19 modelers at Imperial College to work with it - so all the same outputs are created just a lot faster. As I sensed that RStan usability is priority the model can now be easily switched between running parallel (requiring 2.23) and not.

The bits you added about generalizing the covariates being used are helpful as well, of course. It‘s just that these need additional data prep steps which I have wanted to avoid to deal with myself.

So it would be great to settle all of this and I would personally, of course, be happy to see the uptake of reduce_sum coming with 2.23.

Sebastian

rok_cesnovar · April 20, 2020, 12:37pm

My contributions were minimal. I think @mans_magnusson should try merging the changes of your latest two models and we can then add the option of cmdstanr from my script. The latter is a piece of cake.

mans_magnusson · April 20, 2020, 6:37pm

Hi all,

Now the new model proposed is included. I did not get much additional speedup including these last fixes, but it makes within-chain parallelism easiliy available if interested. You can see the final PR with this here:

Please let me know if I missed anything.

So now I let @flaxter et al decide what to do with the PR.

/Måns

flaxter · April 20, 2020, 6:40pm

Thanks all! We’re in the midst of revising our report (and the speedups helped immensely!); just need a few days to catch up with all of this great work.

flaxter · April 21, 2020, 1:13pm

Do you have a link to this?

saudiwin · April 21, 2020, 1:24pm

sure thing!

Github: https://github.com/saudiwin/corona_tscs
Paper: https://osf.io/preprints/socarxiv/dkvxy/

We haven’t done any modeling with it beyond descriptive stuff, but were planning to at some point fit the data with this simplified model I’ve been working on:

https://osf.io/preprints/socarxiv/jp4wk/

We’d love to have you all use the data and are happy to answer any questions!

saudiwin · April 21, 2020, 1:30pm

I also do a lot of work with measurement modeling, and am working on a latent variable model for policy activity. Still a WIP, though.

melodiemonod · May 7, 2020, 5:51pm

Hi there,

We have coded up an age extension of Seth’s model.

Our code is on Github:

We are running 1 chain with 100 iterations and 50 warmups with
Rscript base-ages.r
It takes 50 minutes on my hp with Ubuntu version 18.04.4, R version 3.6.3, and Rstan version 2.19.1.

Runtimes on a single core get quickly out of hand.
We would be very grateful if someone could point us to parts that can be sped up. We are keen to try to parallelise this over countries, using cmdstan 2.23.

Thank you.

wds15 · May 7, 2020, 9:37pm

You can do the same thing I did for Seth‘s model. So replace those inner for loops with dot products and use reduce sum as you can see from the speed flavors of the model which you have in your git already.

olli0601 · May 8, 2020, 5:45am

I think Melodie has done those kinds of things. There may be one or two more tricks which would be great to incorporate (and we d love to know of them!!), but do you think parallelising the code and using cmdstan would give us good gains? And how should this be done best as the Stan model is quite a bit different?

Topic		Replies	Views
Speed issues since upgrading to RStan v2.21.2 rstanarm	39	1201	August 30, 2020
Stuck at Warmup iteration with no error : CmdStanR CmdStan techniques , fitting-issues	48	3506	April 21, 2020
How to speed up sampling in rstan? Modeling rstan , performance , hierarchical-model	6	6197	July 24, 2020
The sampling has no response, when rstan 2.21 run in R4.0.5 General rstan , fitting-issues	15	1178	September 15, 2021
Stan on computing cluster: strange results CmdStan	11	1671	June 8, 2018

Baseline model (base_general.stan)

Improved model (base_general_speed.stan)

Improved model (base_general_speed.stan - without improvement in E_deaths)

Related topics