Convergence within chains, but not across chains

Hello everyone,

I am running a Hidden Markov Model in Stan where some covariates drive the transitions between the latent states. Running the model on larger samples, I am still having issues with poor convergence in most of my parameters (as indicated by Rhat; in another post, it was recommended to apply the non-centered parameterization, which unfortunately didn’t solve the problem).

Looking at the summary output of the 2 chains I was running (each with adapt delta=.90, max_depth=12 num_samples=2000), convergence appears to be quite poor:


Looking at the summary of each chain separately, it looks ways different:

Chain 1:

Chain 2:

Now the traceplots of the 2 chains show what’s going on. Most of the parameters converge within a chain, but they converge to “slightly” (there is a difference, but this difference does not change the content-related implications I want to draw from this model) different values.

My questions are:

  • Am I correct in assuming that running this model again with adjustments to the computational parameters (iterations, adapt_delta, etc.) would do no good at all?
  • Is the convergence problem really as big as it seems if it doesn’t change the insights I want to generate with the model?
  • Any other recommendations on what I should try?

You have encountered a common pathology whereby the posterior is multimodal and chains get “stuck” exploring only one mode. Often this occurs when two or more parameters in the model are “non-identified”, meaning an increase in one can be offset by a decrease in the other to yield the same likelihood as if neither had changed. Take at the pairs plots of the posterior samples; non-identified parameters will show a strong correlation.


Thank you. I have a follow-up question: Let’s take these two parameters mu and nu (state-dependent intercepts of two equations in the 2-state model) as an example:

Is it necessarily problematic that mu[1] is correlated with mu[2] and nu[1] and nu[2]? In my specific example it would only mean that if the intercept of state 1 is higher in that equation, the intercept of state 2 is higher, too.

Or is the correlation between two parameters an issue per se?

I’ve been wondering the same thing, and I don’t know whether I have heard a good answer yet.

I guess the simplest example would be just a two parameter problem where prior and posterior are multivariate gaussians, but the prior looks like a circle, while the posterior looks like an extreme ellipse.

But even then, the posterior may either have shrunk considerably in all directions, which I guess would make the correlation unproblematic, or it may have only contracted in one direction, which would tell you that you have no information whatsoever about the other direction?

1 Like