Hi, I’m trying to set priors on a regression model. The model includes income, wealth, and occupational prestige as predictors. All three of these are closely related, so a lot of information is probably shared between the predictors. Because of that, I feel like an independent prior on the coefficients is unlikely to be a good choice. (For instance, in the limit where I include two perfectly-collinear predictors, the prior on them should have a correlation of -1.) Is there any advice on dealing with situations like this?

Our Bayesian workflow paper recommends using prior predictive checks to evaluate priors (and cites the source of Gabry et al.'s visualization paper, which is the thing to read about prior predictive checks). So you might be able to demonstrate that sensible independent priors on the coefficients lead to non-sensible prior predictive checks. The question will then be how to set the correlations in your prior if you don’t have good prior information.

One thing to do is fit the posterior and measure correlation there. Just because the prior doesn’t model correlation doesn’t mean it won’t show up in the posterior. Correlated priors have much stronger effects when data size is small. With large data sets, the data will swamp the prior and you’ll get the posterior correlation you want.

When we have groups of these parameters, we can fit with a hierarchical model. For example, if we use hierarchical models across time or across areas, we’ll get sets of coefficients that are naturally modeled with hierarchical priors. We discuss multivariate hierarchical priors in the regression chapter of the user’s guide.

If your prior does only that without any shrinkage, then you’ll see that you get an infinite ridge in the posterior. You need some source of shrinkage, too, or you’ll wind up with an infinite “ridge” in the posterior. The whole point of “ridge regression” (L2 penalty) is to get rid of the ridge.

We step through identification in the prior in the problematic posteriors chapter of the *User’s Guide*, though not with the -1 correlation (which just leans into the same effect you get with improper uniform priors).