How to set priors for intervention design

Model design: An randomized control intervention study with two-time points (pre-intervention and post-intervention) and 2 groups (intervention group and active controls). The intervention is to estimate cognitive gains that can be attributed to intervention effects (i.e., we hypothesize that group x time effects will be significant on a cognitive task). The primary measures on this task are reaction time and accuracy. For this particular question, I’ll focus on accuracy. My question is how to set priors (I am a novice at brms and got introduced to it because a reviewer on my paper suggested I use it!).

There are two “methods” I was able to arrive at on how to set priors - specific to the model design just described. One was obtained from a reference page that had a similar intervention design (but the measures used in that study were different - i.e., the author did not use reaction time and accuracy). The other was obtained from the “get_priors” function in the brms package. They are listed further below:

Descriptive Statistics for Accuracy:

Group Time N Min Mean Median Max SD
Intervention Pre 56 0.25 0.82 0.92 1.00 0.19
Intervention Post 55 0.08 0.88 0.92 1.00 0.16
Control Pre 49 0.00 0.86 0.92 1.00 0.21
Control Post 50 0.50 0.90 0.92 1.00 0.13

##  "METHOD" 1:  Reference: https://rpubs.com/lindeloev/bayes_factors

priors_mixed = c(
 # -1SD Expected for patient group. 80% CI from 82 to 98 seems reasonable
set_prior('normal(85, 7)', class = 'Intercept'),
  
# none-to-moderate apriori difference between groups
set_prior('normal(0, 8)', coef = 'groupIDUAA'),
  
# some gain expected due to retest and non-specific effects
set_prior('normal(3, 2)', coef = 'timeIDpre'),
  
# a priori group A and B are expected to improve equally
set_prior('normal(0, 2)', coef = 'groupIDUAA:timeIDpre'),
  
# Between-subject SD at baseline around 15
set_prior('gamma(30, 2)', class = 'sd', coef = 'Intercept', group = 'subID'))


##  "METHOD 2": 

get_prior(Accuracy ~ groupID * timeID + (1+groupID*timeID | subID), 
                data = AY_corr)

priors_mixed = c(prior(student_t(3, 1, 2.5), class = Intercept),
                prior(student_t(3, 0, 2.5), class = sd),
                prior(student_t(3, 0, 2.5), class = sigma))

## BRMS model:
full_brms_AY_corr = brm(Accuracy ~ groupID * timeID+ (1+groupID*timeID | subID), 
                data = AY_corr, 
                prior = priors_mixed, 
                save_all_pars = TRUE,
                chains=4,
                iter = 10000)
null_brms_AY_corr = update(full_brms_AY_corr, formula= ~  .-groupID * timeID)
BF_brms_bridge_AY_corr = bayes_factor(full_brms_AY_corr, null_brms_AY_corr)
BF_brms_bridge_AY_corr

Any help on how accurately set priors would be greatly appreciated.

I doubt that the priors from a completely different measurement context will make any sense to use for your data. The default priors built into brms tend to be well-considered and weakly-informative. I’m not a brms expert, but as far as I understand, for accuracies the priors are on the log-odds scale where those student-t’s are pretty broad and heavy-tailed. When you generate priors using brms for continuous data like RT, I think they are on the “z” scale whereby the data is divided by it’s overall SD. You should nonetheless use prior predictive checks to make sure that these defaults don’t wildly diverge from your domain expertise. (n.b. for RT, you may consider a log-transform or a more nuanced process model, else the PPCs will likely yield negative RTs)

Hey @AR_LearningStats,
I get the feeling that you might be asking 3 questions at the same time, please correct me if I am mistaken:

  1. How do I know which parameters I can set priors on?
  2. How do I set a prior distribution for a specific parameter (or parameter group)?
  3. What are good prior distributions to use for my specific example?

Question 1. and 2. are the easier ones, as you have them mostly figured out with the code you present.
For question one, you can define a model in brms and use the get_prior() or the prior_summary() functions to get a data.frame of the models priors (the default ones if you didn’t specify anything explicitly) or a summary print out respectively.
Both will tell you what parameters you can set a prior on.

The anser to question 2. is what you do with method 1 in your code.

The really hard part is questions number 3. I don’t think I can give you a definitive answer here but here are a number of things that have helped me:

  • Plot your priors. You can run your model with sample_prior = "only", and then use the pp_check()function to see how your priors look on the outcome scale. Especially if you start to have more complex models this can be very helpful to see what your priors actually “mean”.
  • Use past experience. Recently I was able to use a prior for an intercept based on some work I did in the past. It is important here to make sure you are transforming the value to be on the same scale as any link function you might have (log, logit,…).
  • Go with some kind of sensible default. The Stan Documentation could be a starting point for this.
1 Like

Thank you so much, this is incredibly helpful. I think you correctly identified my queries and I have a better sense of what I need to do. I will follow through with your suggestions re: 3…