In what way are Gaussian priors "not robust" for logistic regression coefficients?

Ye Official Prior Choice Recommendations dictate that when specifying weakly informative priors for logistic regression coefficients, “[n]ormal distribution is not recommended as a weakly informative prior, because it is not robust”. Student-t distributions with df from 3 to 7 are recommended instead.

Then it goes on to say that the Gaussian distribution is fine if the prior is intended to be informative rather than just weakly so. This is fairly mystifying to me as well.

I do understand how the Gaussian distribution is “less robust” than a student-t one when y is continuous. For example, if you’re using a Gaussian distribution to model male heights in centimeters (with true values e.g. of \mu = 180, \sigma = 6) and your sample includes an outlier suffering severe gigantism with a height of 300cm, then a Gaussian model will shift the estimated location and scale much farther up in response to the outlier than a student-t model, which will merely thicken the tail while leaving the estimated location and scale nearly unaffected.

But I don’t understand how this applies to prior specifications for logistic regression coefficients. When I fit binomial models to mock data with a wide range of different logits corresponding to the \beta's, a Gaussian prior with a given Scale always imposes more shrinkage on large logits than a student-t prior with the same Scale. Thus, I don’t see in what sense the student-t prior can be regarded as “more robust” for these parameters. The mind boggles.

1 Like

Ultimately, this is a semantic question about what “robust” means. Presumably, calling a normal prior “not robust” as a choice for a weakly informative prior reflects an idea that Normal priors tend to be too informative if what is desired is a weakly informative prior. They strongly suppress large coefficient values–perhaps more so than the modeler intends. They are not robust prior choices in the event that the true coefficient is quite large. You’re right; normal priors shrink large values more aggressively, which means that they might not be a robust choice of weakly informative prior model, which is generally intended not to shrink values too aggressively.

On the other hand, if the prior is intended to be informative, then it’s easy to imagine using the word “robust” to mean the exact opposite, as in normal priors are robust to atypical datasets and provide sufficient regularization of the coefficients whereas t priors do not. Whether or not that italicized statement is true depends entirely on which distribution (normal or t-with-finite-df) accurately captures the prior information that the modeler wishes to encode.

The recommendations are for the weakly informative case, and the point is that when we say “weakly informative”, we usually mean something with more mass in the tails than Gaussian.

4 Likes