I would like to use a standard weakly informative prior in my model (i.e., normal(0, 1)).
I believe that I would scale this to the mean and sd of my dependent variable. For example, if my DV is reaction time, with a mean of 500 and a sd of 100, my prior would be normal(500, 100).
Am I right about that?
My question is: lets say I have a continuous predictor and a categorical predictor. The continuous predictor has been z-transformed. The categorical predictor is dichotomous and sum coded (i.e., -.5 and .5). Can I just slap the same prior (i.e., normal[500,100]) on both of these variables? Really I have two questions:
Do I have to do anything to bring the continuous and categorical predictors onto the same scale?
Does it make sense to use the same predictor for a continuous and categorical predictor?
I don’t have it at my fingertips, but I seem to recall reading one of Gelman’s blog posts where he advocated for putting continuous data on a normal(0, 0.5) scale because it matched reasonably well with binary data. From that perspective, it seems like a normal(500, 100) on the continuous predictor would be roughly equivalent to a normal(500, 200) on the binary one.
Hi Lisa, I do think people often answer questions like this but we get a lot of brms questions and sometimes it really does take a long time to get a response so I understand the frustration. But it’s totally okay to ask for private consulting here. I remember a concern being raised in one post about asking for free tutoring (I think the concern was that we try to keep free stuff public for everyone to benefit from) but you are definitely welcome to ask for private consulting here.
I don’t think that your general procedure for thinking about priors is advisable (if I understand correctly).
First, it is a problem to allow your data to inform your prior. This issue is actually really subtle (it is technically ok if your data prompts you to reexamine your domain expertise, and to cause you to realize that your previous prior mis-characterized your prior domain expertise). But the procedure that you describe sounds problematic. I think I see the underlying logic–in an intercept-only model, the distribution of the data provides a very weak guess about the location of the mean. But this doesn’t actually stand up to scrutiny. The idea that the distribution of the data provides a weak guess about the location of the mean already presupposes a sufficiently flat prior, and then allows a “first look” at the data to tighten the initial flat prior. If your “initial prior” (aka “the prior”) is truly close to flat, you should just use that as your prior in the analysis without tightening it around the sample mean.
Second, the logic breaks down even further when you’re picking a prior for the regression coefficients rather than the intercept of an intercept-only model. I think a toy example is the best way to see this. Imagine you’re fitting a linear regression of the form y = a + bx + Normal(0, sigma). Imagine that in reality x has no effect on y, and this lack-of-effect is consistent with our domain expertise. Imagine further that the distribution of y in our data is approximately normal, centered at mean(y) = 100 and with standard deviation sd(y) = 1.
If I understand your procedure correctly, you are suggesting putting a prior of Normal(100,1) not only on a but also on b, which implies that you are certain as can be that x has a huge positive impact on y.
The main advice that you’ll find on this forum regarding choice of priors is to do your best to choose priors that are consistent with your domain expertise. This doesn’t mean you have to write down priors that precisely encode your current state of belief. If you want weak priors that are consistent with your domain knowledge, choose them so that they cover the full range of values that are remotely conceivable to you–the range outside of which you would assume that something had gone catastrophically wrong with your data collection. If you really don’t want to do this, you can use truly flat (improper) priors, but this is a generally unpopular choice among Stan users (and some have convincingly argued that these flat priors NOT non-informative). But what you must not do is to use priors that are directly informed by the data you are modeling.