Imagine I have 6 chains running in HMC algorithm (with Stan). After warm-up, which took 2000 iterations, three chains have converged (I know the ‘true’ parameter values), but the latter three have failed to do so. Sampling in both cases took 500 iterations.
Then I’ve used the method suggested on the site and run warm-up for the failed three chains again - say, 2000 more iterations. All of them finally converged. My question is rather general: Is it methodologically fine to use different warm-up (burn-in) sizes for different chains? Or should I also add some warm-up and resample the successful chains as well (which, I belive will not add to much to the results)?
Oh, thank you for reply. What are the ways to investigate those fails? I thought that it just can happen because of unfortunate starting points (usually I take them at random).
My example is complex enough (mixed logit aggregated demand model).
Stan optimization procedure (i.e. newton) finds the global optimum reliably, but HMC usually fails (variational algorithm usually also fails).
HMC only converges if I take initial values not far from optimization parameters.
I’ve tested it even with 10_000 warmups.
Typically we want to make sure that there are no divergent transitions reported. That’s the usual problem with non-convergence due to sub-optimal parameterization. The usual fix is to reparameterize. The most common cause is centered parameterizations of hierarchical models.
The second quote here says that HMC succeeds. Was the first just referring to our defaults failing to run for enough iterations?
If you have something like a high-dimensional regression, you can step down the initialization from uniform(-2, 2) to something like uniform(-1, 1) or even tighter. That tends to put you less far into the tail.
You also want to parameterize the model as much as you can so the posterior is centered and roughly unit scale (in the unconstrained parameters). Then the default initialization will be roughly correct.