Dear Stan experts,
I was wondering how to properly specify different priors and then decide which is more likely.
Let’s say there are two groups, treatment and control, and I wanted to fit the same model to them. Besides being interested in the difference in the posterior of the same parameter, I may also assume that the treatment group and the control group have different priors. E.g., the control group may have a belief of some normal pain, but the treatment group may have a lower belief of pain (belief of reduced pain).
Assuming the parameter is between 0 and 1, and what I have in mind is to do a grid search with a step of 0.01, and set the prior to be centered around the step, thus 100 prior specifications; then I fit 100 stan models to the same data (treatment or control), with the 100 priors, and lastly do model comparison to find out which prior value is more likely to the data.
Does the above approach sound valid? Or completely wrong? I would imagine this type of question is generic and I highly appreciate if someone could give a hint to solve it more properly.
Many thanks in advance,
If the treatment parameter is actually between 0 and 1 by construction, then I would estimate the model once with a beta hyperprior on the prior mean. Although that is not much different than just putting a standard uniform prior on the treatment parameter in the first place. Fishing for what prior is most consistent with past data is not that important for posterior inference. Although it is important if you were going to do model comparisons via Bayes Factors, if you were going to do that then gridding a whole bunch of possibilities does not make much sense either.
Thanks for the reply!
I was thinking to impose a hyperprior over the parameter as well. But if I use a beta prior, I would still struggle with the choice of the prior of a and b, or mu and kappa etc. And, it the parameter is not bounded, should I then use a normal prior?
Re the second point, if I was about to use Bayes Factors, what do you suggest that will make better sense?
Many thanks again,
The process you suggest is an example of empirical Bayes and will, in general, lead to poor performance. By trying to tune the prior to the observed data you ultimately overfit your inferences to that data, i.e. the inferences will be accurate for that one observation but significantly less accurate for other observations that you were just as like to see.
A prior distribution is not something to be learned from the data, it is a way to introduce relevant domain expertise independent of the data into the analysis. That domain expertise can be personal or collective or elicited from others, theoretical or heuristic. For more details and a discussion of how to evaluate the utility of your prior in a given analysis see https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html.