Why StudentT(3,0,1) for prior?

Yes. The idea is that thick tail reflects our uncertainty in the prior scale and if we have underestimated the prior scale thick tail easier detection of prior-likelihood conflict. I had good experience with t_3 or t_4 when working a lot with GPs when sometimes thick tail really described our prior information for some covariance function parameters well. See O’Hagan (1979). On Outlier Rejection Phenomena in Bayes Inference, JRSSB, 41(3):358-367, for more on benefits of thick tail in case of Student’s t.

Possible complication that I know (or remember) is computational issues. Heavy tailed prior and weak information from likelihood (due to weak data or weak identifiabilities in parameterization) can lead to heavy tailed posterior. For example, dynamic HMC used in Stan is much better than many other MCMC algorithms for sampling from thick tailed distributions but still has problems as least in case of Cauchy. The nice property that the thick tailed prior is robust in case of misspecified scale, can also lead to multimodality which can also cause computational problems. I also recommend normals (and half-normals) because of these computational issues. This is especially recommended when you have good prior information on the scale or if you know that the result is not going to be sensitive if you set the scale to a much larger value. Using normal prior changes a bit how to diagnose the misspecified prior scale, but that is not a big issue.

And sometimes we use even thicker tailed distributions than t_3 Bayes Sparse Regression

4 Likes