I’m running a beta regression with 10 continuous predictors. I’m fairly new to Bayesian modeling (about a week in), and I’m trying to set weakly informative priors. I specified the following after standardizing my predictor:
Intercept: Normal(-0.5, 1.2) — based on the mean and SD from a previous study
Beta coefficients: Normal(0, 1)
Phi : gamma(0.1, 0.1) (the brms default)
When I simulate outcome data from these priors, most simulated y values are very close to 0 or 1. I assume it is because the number of predictors combined with liberal betas adds up.
I tried tightening the beta priors, but I worry this might make them too informative. I’m confused about how to proceed and how to balance having weakly informative priors while keeping the prior predictive distribution reasonable. I would be really grateful for any advice on this issue :)
Good on you for checking the prior predictive distribution. Shrinkage priors were developed exactly for this reason. I would recommend looking into the R2D2 prior paradigm, which is nicely implemented in brms and is an elegant solution to decomposing the explained variance between your predictors.
The problem you’re running into is that independent weakly informative priors do not add up to a multivariate weakly informative prior. This was the point of the (arguably misleadingly named) paper that led us to take prior predictive checks more seriously:
As @mhollanders pointed out, you probably need a better weakly informative joint prior from which to simulate. I don’t know if the R2D2 prior easily admits simulation, but @bgoodri will know.
Also, it’s OK to simulate from a more constrained prior than you will use to fit as long as there’s enough data to resolve the model. You can’t do strictly proper simulation based calibration this way, but I believe it’s what most of us do in practice.
Thank you so much for the reply and explanation @mhollanders and @Bob_Carpenter :) I will try working it out with R2D2 prior! @Bob_Carpenter, I am not sure if I understood your last point correctly: if the data is strong enough, the posterior will be dominated by it, so is it okay to use a slightly different (more constrained) prior for prior predictive checks? I might be misunderstanding.