I’d like to share a preprint titled “Opaque prior distributions in Bayesian latent variable models”, which describes situations where prior distributions behave in ways that many users would not expect. This can especially cause problems for reproducibility and for some model assessment metrics.
We are currently working on revising the paper, so we would appreciate comments, criticisms, relevant literature that we missed, etc.
We could instead place priors on individual parameters within the covariance matrix, but, when we build a covariance matrix using these parameters, the resulting matrix will sometimes be non-positive definite. We elaborate on these issues below.
The way Stan works, when you declare
parameters {
cov_matrix[K] Sigma;
}
then Sigma will always be positive definite (up to numerical precision) and uniformly distributed. We use a Cholesky factor under the hood with log transformed diagonals. But by the time you get to the model block, you have a positive-definite Sigma on which you can place an additional prior. You can even add a Wishart prior and then additional prior terms—Stan only needs to know the prior up to a normalizing constant, so it’s OK to do something like this:
Thanks, we had overlooked that cov_matrix already starts you out with a positive definite matrix!
I worry a bit about replacing a hard constraint with a soft constraint. It seems that, if you have enough data, the normal(0, 0.1) prior becomes weaker and you may end up with a posterior that is far from the equality constraint that you intended. Maybe a solution is to use a prior like normal(0, 0.1/sqrt(n)).
Of course, if the posterior is far from 0, it indicates that the equality constraint was not reasonable in the first place. But the traditional psychometric approach is to fit one model with equality constraints, one model without equality constraints, and compare them.
Thank you for sharing your preprint! We are eager to hear what others think of it. In particular, we would love to know how they view the implications of opaque prior distributions on Bayesian latent variable models and what methods they recommend to address this issue. We also invite people to provide any suggestions they have on ways to improve the paper and to point us to any relevant literature we may have missed. We very much appreciate all feedback!