Bayesian hierarchical model - how to spot convergence issues early

Data are available on new daily age-specific COVID-19 mortality counts for a given country. The hierarchical model at hand models the age-specific latent transmission dynamics with an age-structured SEIR compartmental model. The estimated age-specific new daily infections are linked via some function with the age-specific expected daily deaths and those in turn are linked with the observed deaths via an over-dispersed count model. The age-specific force of infection is allowed to vary in time. See here and here for related models.

I have successfully fitted the hierarchical model using Rstan on a Windows desktop machine to data for a given country, where I assume that the population is split into three age groups (see image below), for a time series of 7 months.

The 7-month analysis model takes about 1.5 day to run (6 chains, 500 warmup iters + 500 main iters per chain). I use the Trapezoidal rule to solve the system of ODEs that represents the SEIR comartmental model which involves A^2 + 1 parameters for which appropriate prior distributions are assumed.

For external validation purposes I wish to reimplement the analysis for a further 3 months (10 in total). The 10-month analysis model takes about 3 days and, unfortunately, fails to converge, i.e. flat traceplots, \hat{R} are unreasonably large, n_eff is either 1.5 or 3 or 5 for all parameters.

Over the course of 2 months I have attempted to fine-tune my model with different prior distributions where appropriate. I use the point estimates from maximization of the joint posterior [optimizing() function] as starting values for the sampler. Unfortunately, all my attempts lead non-convergence and the execution times are prohibitively long.

What are the best practices to spot convergence issues as early as possible during sampling, so that the process can be terminated without having to wait for some many days?


GR_Experiment63_ModelFit.pdf (70.9 KB)

Pathfinder has been claimed to help with “spurious” modes and with quick model development:

Pathfinder can be part of a more effective computational workflow, starting with fast multivariate optimiza- tion, moving to Pathfinder’s distributional approximation, and then if necessary moving to fully stochastic MCMC algorithms. Even biased approximate inference can be useful if it can produce one reasonable draw quickly or several in parallel. If such draws are unreasonable, there is a good reason to believe the model is misspecified or has an error in its code. This allows us to fail fast during model development, a perspective from software engineering (Shore, 2004) that we recommend applying to statistical workflow (Gelman et al., 2020; Gabry et al., 2019).

I should also be able to help, but that might take a while. Could you share your stan code and if possible the data?