It seems to me that frequently when folks on the forum have identifiability issues it’s usually due to mixing problems or poor Rhats. This is induced by multiple chains starting in different places and never coming together. I imagine lots of identifiability problems are due to symmetries and therefore have these fairly straightforward to detect degeneracies which present themselves within a few runs, however I guess I am right in saying that good mixing and convergence is not a necessary and sufficient condition for identifiability?
If this is so, and aside from complex and time consuming model specific analytical proofs, are there empirical ways to make sure that a model is identified and won’t change it tune had you only run it N more times, or if the data was just a little bit different?
I guess thinking about this I don’t quite know the definition of identifiabilit. If it means something like, figuring out if a parameter is vaguely in some range, if you don’t collect the right data, you won’t be able to identify your parameters.
If you collect no data, for instance, you might get good mixing and convergence and not identify the parameters you want.
We usually talk about correlated parameters being not-identified, but if that correlation is on a line you might be able to sample the posterior efficiently with a dense metric, for instance.
I think fake data simulations. So generated data from your model and fit it and see if it behaves like you want. This’ll sort of be an upper bound on how it behaves in the wild (cuz presumably your real data doesn’t come exactly from the model you’re fitting).
And if you have a model and don’t know what to set some parameters for your fake data simulation, you can always run a preliminary fit to figure out what vague range to pick parameters. You wouldn’t need this fit to pass all the diagnostic checks or anything – you’d just be trying to figure out numbers to plug into your model.
Just that there is a unique “best” solution to the fitting problem. I.e. a most likely posterior. This makes it possible to interpret parameters; it would not be so if there’s multiple equivalent solutions.
If there are multiple sets of parameters that yield the same posterior; simulating data can’t reveal this.
I think the crux of my question is, if convergence and mixing are not necessary and sufficient criteria for identifiability, and if we don’t know a priori that our model is identifiable, what should the diligent Bayesian do?
PS: Here is a problem of this kind on the forum. The model works fine 4/5 times. What if it was 999/1000 times? Would he have ever spotted it? In his case, the model fits generated data, but what if the data wasn’t generated? What guarantee is there that the most frequently fitting parameters are the true ones?
Hmm, well the posterior itself is probably the best probabilistic solution under some assumptions, but that’s beyond what I know. Presumably I unwittingly assume this a lot.
I guess “most likely” reminds me of having-an-MLE. That doesn’t have to exist cuz things can go off to infinity or whatnot.
I think the way I colloquially talk about identifiability around the forums is something like: the marginal distribution of a parameter needs to be in a small range.
But I guess this isn’t something we’re optimizing for, or whatever.
Good point, but presumably fitting them a bunch would. Eventually you’d get to:
And yeah I guess nothing is guaranteed here.
Interesting link. I don’t know how to think about it when algorithms sometimes fail/sometimes work (other than being annoyed, at least).
Like this isn’t just an MCMC thing. Happens with anything. I guess short of changing algorithms, you’re changing random initializations and tightening up priors.
This can look like you’re carefully regularizing your problem (and you can find other justifications for this after the fact), or it looks like cheating (whatever that means). Either way you’re just trying to not get tricked by your inferences (or rather, trick yourself into tricking the calculations).
Maybe the best you can do is try to understand how often calculations fail when they are repeated and work from there.
Like, what fraction of chains fail? Does this fraction change with smaller initialization? Is this specific to one set of generated data.
And then there’s Andrew’s folk theorem that if a model fits badly then there’s probably something wrong with the model (rather than the computation).
Thx @bbbales2 for the musings! I found “or rather, trick yourself into tricking the calculations” particularly enlightening. I guess like other complex things, it’s probably good advise to stick to well studied models with theoretical guarantees. At least then if there is a problem there’s a good chance someone else experienced it first.
We gotta pick the right level of abstraction to think about a problem. It’s not clear how to do that. But it might be easier to think about it in terms of constraints.
Like in linear regression, your constraint is you gotta do AX + b, and sorta you engineer around that.
ML people might point out that Bayesian inference is just another loss function, or limited in some way. And that’s probably true, but it’s a constraint we use to think about complicated problems.
Can make the same argument for generative modeling – there are certainly more not-necessarily generative models, but generative-models-only is a useful boundary condition to regularize our thinking in model-space.