Hi Pia!
Generally, I’d say you are on the right track. When I started out learning Bayesian Stats and Stan, the prior was fairly mysterious to me. The more I worked on real applications, the more I felt comfortable to apply reasonable priors – and the more I realized just how ridiculous those seemingly uninformative priors often are.
And yet, when I look at published articles using Bayes in my area (education and psychology), most authors argue that they had not a lot of prior information about the parameters, so that they rather used very vague priors such as N(0, 10000).
I think people are a) too lazy to think about their prior expectation about model parameters, b) don’t realize that they do have quite a lot expertise (domain knowledge) to formulate reasonable priors, c) fear that other might deem their analysis too “subjective” – so better be as vague as possible, d) lack the ability to translate their intuitions to a prior distribution, and so on… while writing this list I realized, that there are a ton of potential reasons not to use reasonable prior… Also, for me it was mainly d) in the beginning.
My question is, whether the actual response scales that we use in the study do not already provide a lot of information that we can use.
This is a really good start. You might also want to check out this paper. Generally, you’d want to think about the response variable’s scale, but also about the scale of the predictor variable: Changing the units of the predictor variable should also change your prior belief about it’s regression coefficient – at least if your prior is fairly informative.
So based on this, I would think that a prior of the form N(0,4) should be a useful prior? Would this be considered a very informative prior?
I think it is definitely much more useful than a N(0, 10000) prior. I still don’t think that it is very informative – at least not for a regression coefficient. With N(0,4) you imply to not be that surprised to see a regression coefficient of 5 or greater:
> pnorm(5, 0, 4, lower.tail = F)
[1] 0.1056498
That means that you assume there is a 10% chance that a one unit change in that particular predictor variable could have a positive “effect” that is basically at least the whole range of your response variable. And you give the same a priori chance to a negative effect on that same predictor. Well, that could be true, if the units of the predictor are very samll – or it really is a “that could change everything” type of predictor. But it’s not really informative, I’d say.
Also, you could just do rnorm(1, 0, 4)
a bunch of times and ask yourself “Would I be surprised if I’d this number in the regression output table? As a mean/median, or lower/upper CI?”. Or just do hist(rnorm(1000, 0,4))
. Now, this is all a bit silly for the Normal, but I think it is quite useful if you are dealing with non-normal priors (think of priors on the scale of your response variable…).
Now, the really proper way of doing all this is via “prior predictive checks”. And for this, you might want to check this paper. There are a lot of functions in bayesplot and brms that can help you develop a workflow that lets you choose reasonable priors.
PS: I’ve posted a lot of questions during the last weeks, but I’m afraid there is still no end to it since I’m trying to get familiar with bayesian analysis and brms more specifically having this forum (and of course the www) as my only help… Hence, many thanks for all your help, especially Paul’s!
I always recommend Statistical Rethinking by @richard_mcelreath. Also check out his lectures on youtube.
Hope this helps.