Convergence / latest version of Stan issues with our covid-19 model?

Our code is on GitHub here: https://github.com/ImperialCollegeLondon/covid19model/

I’m running base.r from Rscript (i.e. Rscript base.r; this runs 4 chains for 200 iterations, 100 warmup; takes < 4 minutes).

Everything runs out of the box on my Macbook Pro, with R version 3.5.1 and RStan version 2.18.2. 100 warmup iterations isn’t actually enough, so it hits maximum treedepth, but I’m pretty sure 1000 would be fine:

Warning messages:
1: There were 3 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
2: Examine the pairs() plot to diagnose sampling problems

Here’s the issue–on an Ubuntu server I’ve just spun up on the cloud, R version 3.6.3, RStan version 2.19.3, I get these mystifying messages–but when I check shinystan everything looks fine in terms of R-hat, etc:

Warning messages:
1: There were 1 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
2: Examine the pairs() plot to diagnose sampling problems

3: The largest R-hat is NA, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

As you can imagine, debugging this is a priority, so if anyone is available to help in real-time today or tomorrow, feel free to contact me off-site: s.flaxman@imperial.ac.uk

RData image from the run on the server is available here (100 MB): https://www.dropbox.com/s/zrmpltaic7kwm98/base-568665.Rdata?dl=0

1 Like

Just did 400 iterations, 200 warmup with RStan 2.19.3. No errors about transitions, just the mystery errors:

Warning messages:
1: The largest R-hat is NA, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
2: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
3: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

I’m on ubuntu and taking a look now…

One thing I just noticed: you’re thinning the samples. Any reason for this? My understanding is that this is generally not recommended these days.

2 Likes

Nope, no reason, just to keep filesizes manageable for longer runs as we’re all working from home these days, some with spotty internet.

Ok, so to see which parameters are causing trouble for the rhat computation, I did:

stats = summary(fit)$summary
stats[is.na(stats[,10]),]

Which seems to tag your generated quantities as having trouble. Looking at that section, you set E_deaths0[1, m]= 1e-9; so that quantity will be a constant and break the automatic rhat calculation . Ditto I think lp1 , which has entries that are constant thanks to the respective entry in E_deaths0 being constant. These are not issues with your model nor your use of the generated quantities section; merely an unaccounted-for case for the automatic rhat computation.

Now, there’s also some entries in lp0 that are constant too, and I think that’s similarly from setting E_deaths[1, m]= 1e-9; in the transformed parameters section, though I don’t understand why E_deaths itself isn’t also showing an NA for rhat. In any event, I think this is ok too.

I just did a 2000 iteration 6chain run (took 20mins). Still getting the ESS warnings, but I think those might be thanks to the constant parameters too. Checking on that now…

3 Likes

Ok, when I re-code (attached)base.stan (3.6 KB) the model to not have the generated quantities and putting the stuff from the transformed parameters section into a local environment in the model block instead (so they are not saved and therefore don’t affect the automatic rhat & ess computations), and run that for 2000 iterations (half warmup) and 6 chains, it samples perfectly fine (taking about 20mins) with no warnings and clear HMC diagnostics. So I’d say you’re fine if you use the samples from the unmodified model. If you make any changes to the model down the road, you might consider doing a similar modification to double check that no issues were induced by the new model structure, but then revert to having generated quantites again if you don’t get warnings.

For the Stan devs, it would be useful to have the automatic rhat and ess computations ignore the generated quantities block. There’d still be the issue of constant transformed parameters, but I don’t see any fix for that other than my local environment hack.

4 Likes

Oh, and note that the model samples fine with default values for adapt_delta and max_treesize, so you don’t have to tweak those.

Thanks Mike! A solution that might avoid recompiling is to put this into the sampling call? Testing now:

include=F,pars=c('E_deaths','lp0','E_deaths0','lp1'))

Ah, of course. I’d forgotten about that feature.

Just noticed that the pairs plots for mu[x] by y_raw[x] are showing banana-shaped dependencies:

And zoomed in on just mu[1] by y_raw[1]:

Maybe some folks more expert than me (@Bob_Carpenter?) could advise on whether this geometry is simply pertinent to sampling efficiency, or whether you need to be concerned about the validity of inference as well.

@flaxter What version of rstan are you using on the server? I think the thing with generated quantities that have no variance should have been fixed in the latest version, but still if you are going to set something to a constant, you could do it in the transformed data block.

Server has RStan 2.19.3.

OK. It looks like that is just a warning message rather than an error that prevents the stanfit object from being formed, correct?

Yep, just a warning and the solutions above should solve it…more or less. But it’s definitely not ideal because:

  1. If the model didn’t fit fine (e.g. Rhat is 2.0 for other parameters), we won’t know.
  2. Lots of people are now using our code and it took us the better part of 4 days to figure out what was going on, so I’m sure novice users will be worried by these warnings.

Yeah. I think the minimal change to the code would just set E_deaths0[1, m] = 1e-9 and then at the bottom of the loop when you are done with it, do like

E_deaths0[1, m] += uniform_rng(1e-15, 1e-16);
1 Like

Just to say that my include=F,pars=c('E_deaths','lp0','E_deaths0','lp1')) still yields:

Warning messages:
1: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
2: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

I will try @bgoodri’s suggestion next.

(Also it should be possible to recode this…)

To warn yourself about Rhat even if Rstan’s code fails (I have the same issue), try the following code:

get_sum <- summary(your_stan_model)

if(any(get_sum$summary[,"Rhat"]>1.1)) {
  param_fail <-row.names(get_sum$summary)[which(get_sum$summary[,"Rhat"]>1.1)]
  warning(print(paste0("Rhat check failed for parameter: ",param_fail," with Rhat ",
                       get_sum$summary[which(get_sum$summary[,"Rhat"]>1.1),"Rhat"],collapse = "\n")))
}

1 Like

Also poor bulk ESS/tail ESS is common for latent variable models with Stan, but doesn’t necessarily mean recovery of the parameters is biased. Even HMC will have auto-correlated chains when you have lots of latent parameters. I’d personally do some runs with more than 100 samples to be safe given higher-order dependencies. If it’s taking too long, consider parallelizing by chain if you haven’t already using map_rect (happy to help if you need it).

2 Likes

OK, looks like @bgoodri has the answer above. (@saudiwin – thanks for the code above, but just to say that I’m pretty sure bulk ESS/tail ESS is good for this model! But I guess RStan doesn’t report that correctly because of the constant value for E_deaths0 in the generated quantities?)

We’ll shortly have updated code on GitHub. Thanks all.