Hi all,
I’m sorry if everyone already knows this, but I’m struggling to find information on how to troubleshoot chain mixing where certain parameter draws are moderately autocorrelated, even though the model fits fine - RHats & divergences fine & ESS reasonable even for the problem chains.
Motivation - many models in my field are quite tough to fit because of their heirarchical structure (with many levels at L1) & a requirement for a fair bit of flexibility. Improving chain mixing would help me fit these models faster, as well as reducing the small, but non-negligible probability that the chain finds a nonsensical solution in a different mode.
When I see individual parameter draws mixing poorly but without a restrictive prior my gut feeling is that there’s too much collinearity between variables, & more extreme values of (combinations of) the other collinear variables are moderately restricting the movement in the ‘problem’ dimension). Is this a reasonable intuition? I’d try addressing it by tightening the priors on the other variables, but perhaps I’m talking rubbish here.
What’s your favourite strategy to investigate this type of issue?
Thanks.
Never mind. It looks like the answer to the above question is something like:
‘check to see if you’re including sufficient domain knowledge in the prior’
i.e. if the model can find modes which are nonsensical to the human, then the human hasn’t given the model sufficient information to determine what is, & isn’t, a stupid value.
I guess I was approaching this by trying to keep my priors only mildly informative, but theres a counterargument that doing this in a situation where large amounts of parameter space are nonsensical is itself (negatively?) informative. I guess this turned into a more complex version of the ‘uninformative Vs mildly informative’ debate.
Sounds like I should be less lazy & do more prior predictive checking.
I’d still be interested in hearing if anyone has thoughts on how to examine how multicollinearity might be affecting chain mixing, short of regressing all variables on each other.
I think your general considerations make sense.
I don’t think there’s much of an argument here are all. If the support for one or more parameters includes values that are nonsensical but still a region of high probability, it is absolutely reasonable (and necessary, actually) to constrain the support using priors or hard constraints (the lower
and upper
values in the variable declaration) – e.g. if you keep getting errors because the method finds a negative variance parameter, there’s no other option than to restrict that value to be positive (or diagonal values of covariance matrices, or any other simple example), except sometimes it’s not obvious what the unrealistic value is.
I can’t know exactly what your models look like, but since you asked about favorite strategies I’ll give a somewhat personal opinion – and I think there are several possible strategies.
-
If there are similar models or a standard formulation in the literature try them out (if they have bayesian implementations with code available, great, if not you could find shortcomings of published results using MLE or without proper diagnostics more generally). Put another way, you may be having trouble getting proper results because other’s made it look easy by doing it crudely (and possibly being flawed).
-
From what you mention your models are potentially large and with a lot of flexibility (which is a great way to get lack of identifiability/multiple modes), maybe there’s a good reason for that or a “gold standard” in the field, or maybe there’s neither. Instead of many levels and a lot of flexibility you could try building a model with a few and not as flexible – this will usually work very well, but not be very useful in practice – then scale up to see where problems start appearing. By identifying those you may be able to formulate them differently and scale up your model without scaling up the problems.
In sum: if there are good models out there, see how people in the field have done it previously (and properly), if not, start from scratch and see if the same field has overlooked some basic rule (there are plenty of examples of inference that looked alright, but bayesian diagnostics or better samplers showed it wasn’t right).