Initilization with dispersion around a good guess

tomli · January 5, 2023, 3:53pm

I am working on a model of high dimensionality (a hierarchical model with ~50000 parameters at the lowest level). Initializing with Stan’s default (uniform [-2,2]) causes numerical problems. Seeding with random draws from the priors kind of works, but it takes a very long time for chains to converge. Sometime 2000 isn’t enough, and that takes around 48 h to compute. I want to give Stan a hand by starting the chains at reasonable values, and, for at least some parameters, I can calculate decent guesses based on the data. Is there a good practice for how to over-disperse initial values around my data-based guesses? Alternatively, is there a diagnostic tool to check if the initialization was over-dispersed (as apposed to check if chains have converged, assuming they were seeded with over-dispersion)?

avehtari · January 9, 2023, 7:26pm

The recommendation to use over-dispersed initial values is to make \widehat{R} diagnostic more conservative. In high dimensions, it is very difficult to generate over-dispersed initial values that would at the same be near the interesting posterior region (sometimes called typical set). It is likely that over-dispersion is not a strict requirement even in theory. Section 5.1 of

has a recent theoretical result. The worry about underdispersed initial points was bigger when only a few short chains were run in 1990’s (due to computers being much slower than today), so that it was possible just by chance to observe early small Rhat (the threshold was also much higher at that time). With long efficient chains (like 1000 iterations of HMC/NUTS in Stan) or with a high number of short but efficient chains (like 100+ chains of few iterations of HMC/NUTS) there is less worry of getting just by chance a small Rhat value. You could examine the behavior of Rhat or even easier to interpret to examine the effective sample size with increasing number of iterations as in Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (with Discussion).

Topic		Replies	Views
Undesirable(?) behavior in initial values of hierarchical model General	9	941	November 13, 2020
Overdispersed initial values -- general questions Algorithms ecology	20	4276	May 26, 2018
Is it fair/legitimate to use least square or optimization estimated parameter values to initialize the chain? Modeling techniques	6	763	April 8, 2021
Correlation of markov chains General	12	1181	April 12, 2024
Why can initialization fail despite appropriate parameter scales and constraints to support? General fitting-issues	9	548	March 9, 2021

Initilization with dispersion around a good guess

Related topics