Checking if my train of thought in determining a prior makes sense

I am analyzing the results from a reaction time experiment.

I am using rstanarm in R to run linear mixed effects analyses, examining the data at the trial level.

I have a dichotomous predictor of interest.

The dependent variable was log-transformed, which is making it a bit tricky in my mind to come up with the prior. I would love it someone could let me know if I am doing this right.

  • Based on the field, the largest difference I would expect between the two levels of my predictor of interest is 50 milliseconds.
  • Because my dependent variable is log-transformed, I can’t simply base the prior on a raw difference of 50. So I took the average of the condition I would expect to be slower in my data (lets say 1200). I then subtracted 50 from that. I log transformed these two values and took a difference. This is now the maximum difference I would expect on the log-transformed scale. (I know using the mean of the data is a bit fishy here. So perhaps what I will do instead is think about what I would have predicted these values to be based on previous literature. But the general approach isn’t changed, I believe. I now have a maximum expected effect size on the log scale.)
  • I now want to make a prior that reflects this. I will use a normal distribution centred at 0, with half of the maximum expected effect size as the SD. I believe this means that 95% of the probability will be between the maximum expected effect size and its negative value. (Granted things are a bit fishy here as well. Should I just use a half-normal distribution and ignore negative values [i.e., ignore the possibility that the effect might go in the opposite direction than I expect]?).
  • Now, my prior is scaled to my dependent variable automatically by rstanarm. (i.e., a prior with a width of 1 is actually equal to a width of 1 sd in my dependent variable). So I figure out how many SDs of my dependent variable = my expected effect size.
  • I end up with this formula:
    ((log(1200) - log(700))/sd(dat$DV))/2 to determine the width of my prior. Note that this is then going to be scaled to my DV automatically by rstanarm.

Your thinking in terms of imagining the outcomes in the direct values, then log transforming to see what the values are seems to make sense to me, and also thinking in terms of what sort of effect you can expect. My pointers would be that, as you are using reaction times, I imagine this is something like a psychology experiment. Unless it is literally like timing for some highly specific physical process where you really know the maximum or minimum expected values, I would not make the priors overly specific - there is a lot of possible uncertainty before seeing the data even if you base it on previous research.

In particular for your prior on the effect of 50ms, I would advise against using a half normal. This means that before seeing the data, you are literally saying it is impossible for the effect to go in the other direction. I don’t know of any effect where I would be confident in saying that for most reaction time experiments. It also wouldn’t be convincing to people reading the paper - in interpreting this parameter, you would have to basically say “Assuming it is impossible for the effect to go in the other direction, the posterior distribution for this effect is bla”. If people have any question about whether they think your effect is legit or not, then ruling out the possibility of it going the other way is probably not the best idea - if it is a convincing effect, then the posterior updating will rule that out for you and be more convincing to both you and the reader!

For your final two bullet points, I’m not 100% sure how STAN does things like the automatic scaling so I can’t comment on that. Hopefully the above is of some use though.

1 Like

Log-transforming the dependent variable makes sense precisely when independent variables influence the dependent variable multiplicatively, rather than additively (an additive unit change in log(y) corresponds to a multiplicative e-fold change in y). Thus, you might find it more intuitive to reason about the prior in terms of the magnitude of the multiplicative (or percentage) change in y that you expect should correspond to a unit change in the predictor.