Strange convergence behaviour

Bob, those are definite possibilities, but another issue can be just really long tails, out of which the sampler can only sometimes find the mode in a computationally feasible amount of time. At least, that’s been my experience. often times the way I wind up with something happening like this is that some aspect of my model is t-distributed, or cauchy, or has some other long tail or strong skewness or whatever.

Using VB to get a sample and then starting from that sample works well in some cases. But usually I don’t start all the chains at the same place (say the vb mean), I start the chains at individually randomly selected samples from the vb sample. I think so long as you don’t have multi-modality / identifiability problems, this should work well to avoid getting stuck off in the tail of a long tail. But, you’re right that with identifiability issues, you do have the concern that Rhat might be fooled.