Suggestion for Prior Choice Recommendations wiki

The Prior Choice Recommendations wiki came up in another thread, and this reminded me of something I’ve wanted to discuss here for a while. I often see folks using a peaked-at-zero (ex. normal(0,1)) prior for variability parameters (standard deviations, variances). This can be useful in imposing shrinkage in hierarchical models, but I’ve even seen peaked-at-zero priors on things like measurement error, where zero is surely a rather incredible value. For example, here’s a trivial model:

data{
    int N ;
    real[N] Y ; //model assumes data has been scaled to mean=0, sd=1
}
parameters{
    real mu ;
    real<lower=0> sigma ;
}
model{
    mu ~ normal(0,1) ; 
    sigma ~ normal(0,1) ; //arguably unreasonable peaked-at-zero prior!
    Y ~ normal(mu,sigma) ;
}

where sigma is given a peaked-at-zero prior, implying that one thinks it most likely that their measurement was achieved with perfect accuracy. Instead, I’ve been recommending folks use something like:

sigma ~ weibull(2,1) //zero-as-incredible prior

Possibly a prior based on gamma() would also work, I am just more familiar with the weibull() distribution.

I don’t think I see any content on the Prior Choice Recommendations page related to this topic (though maybe the last bullet from this section counts?), so what does everyone think of the idea of adding an explicit mini-section on this topic?

3 Likes

@Bob_Carpenter used to say something about peaked-at-zero priors. Don’t know if he can point to some material on this. Perhaps this is also of interest.

1 Like

We’ve used the gamma (not inverse-gamma) prior for its zero-avoiding properties in marginal posterior mode estimation; see this article: http://www.stat.columbia.edu/~gelman/research/published/chung_etal_Pmetrika2013.pdf and also the Wishart (not inverse-Wishart) to avoid degenerate estimates for covariance matrices; see here: http://www.stat.columbia.edu/~gelman/research/published/chung_cov_matrices.pdf

For full Bayes, I don’t see zero-avoidance as necessary from a statistical standpoint, but it can help with computation because it allows us to avoid funnel behavior in posteriors.

Regarding your “peaked at zero” comment: that’s not necessarily the right way of thinking about it. On the scale of log(sigma) there is no longer a peak at zero.

4 Likes

@paul.buerkner what is the default prior for the measurement noise term in brms models with family=Gaussian?

Can you specify which model exactly you have in mind? In any case, the default priors of brms will likely be too wide to recommend them reasonably. In fact, this is one of the aspects of brms that require most improvement I would say.