Hi all, I fitted a Vector autoregression (VAR) model and the chains are not mixing properly. In particular, the 4th chain is behaving differently for different parameters. Ain’t the chains start from a random location? Why only the 4th chain diverges from the rest? Is there any clear explanation for this?
Chains failing to mix can indicate that your posterior is multi-modal and a particular chain is trapped in a separate mode within parameter space. In practice, this may mean that your model is not specified well enough, either through the model structure or the prior information, to efficiently explore parameter space and give proper inferences.
How do the various diagnostics look? Do you encounter any divergences? From the trace plots, it looks to me like some of the chains might be getting stuck in regions of problematic geometry. Hard to diagnose this issue just from the traceplots, but I would imagine your model needs to be modified or stronger prior information needs to be provided to resolve these issues.
I also thought multi modality could be the reason but then is it just a coincidence that only the 4th chain explores a different mode than the rest? If initial locations are random then why is it always the 4th chain that is trapped somewhere different than the rest?
I suppose it depends on how often this coincidence of the 4th chain finding a different a mode occurs? Stan’s default initialization is random, so one wouldn’t expect any correlation between chain ID and any particular behavior.
In any case, the underlying problem here seems to be a model issue, so I wouldn’t worry too much about the particular chain IDs.
What have you done to confirm this is the case?
There is also a problem with chain 1 or 2 (can’t see the exact color). I would assume there could be something wrong with a model
I think the problem is with the parameters being dependent and the model not well defined that causes the issue.
I mean how do you know that it’s just the 4th chain, as opposed to some arbitrary subset of the chains? You say it’s always the 4th chain, but how have you confirmed this to be true?
I further checked and realized that it is not always the 4th chain being problematic. I also found the source of convergence problem which is large chunks of missing data in my time series (imputing them before sampling makes the parameters converging perfectly). I am not sure how to fix the missing value problem without naive imputation before sampling.
There are varying degrees of sophistication that you could use to address the missing data problem. You can handle it entirely within Stan if you model the missing data itself: the missing values would become parameters that are estimated as part of the joint distribution of your variables. There’s a lot of potential depth there, but it’s entirely within the realm of the possibility to handle missing data problems like this.
Actually I defined a parameter for the missing values (didn’t put a prior on them so I guess Stan assumes a very vague prior itself from negative infinity to infinity) and with that the model parameters don’t converge. I don’t really care about imputing missing values but since I am dealing with time series data I can’t simply remove them and do the inference so I have to somehow deal with them in a way that my other parameters are not affected and the model converges.