what are the reasons why rstanarm and brms support putting priors on the standard deviation (in linear regression models) rather than the variance? Would like to understand why (as it appeared to me in older/other non-Stan related literature the variance or precision are preferred). Any thoughts or references are appreciated. Thank you.
This is because the normal distribution in Stan is parameterised using the mean and standard deviation (i.e., N(\mu,\sigma)), whereas in BUGS/JAGS it is parameterised using the precision (i.e., N(\mu,1/\sigma^2)).
The BUGS/JAGS approach is because the gamma distribution is a conjugate prior for the precision, and this conjugacy can improve performance with Gibbs sampling. However Hamiltonian Montecarlo (which Stan uses) is more concerned with the posterior being continuously differentiable than being conjugate, so there is a greater flexibility with prior distributions
Thanks Andrew, correct me if I am wrong, but even when I put a prior on the variance or precision I end up with a continously differentiable model, if it is continously differentiable for putting a prior on the standard deviation? So Stan could have decided to be compatible with BUGS/JAGS still…?
I’d guess interpretation. Standard deviation has the same units as the output and variance is squared and it’s easier to think about intervals of a normal in terms of standard deviations.
Yeah, it’s basically the combination of what @andrjohns said and what @bbbales2 said. Like @bbbales2 said, the standard deviation is much more intuitive, being in the same units as the outcome variable. And since, like @andrjohns said, there’s no advantage to conjugate priors in Stan, there’s no reason to bother with things like variance and precision that are much less intuitive.