Brms hypothesis function with category specific effects

Hi everyone!

I’m analyzing a study in which participants received training for 20 items. Participants were randomly assigned to the experimental (Train = Y) or the control (Train = N) group. The 20 items had been tested 3 times: pretest (Time = T1), immediate posttest (Time = T2), and delayed posttest (Time = T3). Participants’ performance in each item were coded as “Incorrect”,“Partial”, or"Correct". Person covariates included participants’ gender, age, and ability. Person-by-item covariates included participants’ familiarity with and bias against each item.

> str(testdata)
tibble [3,960 × 11] (S3: tbl_df/tbl/data.frame)
 $ ID          : chr [1:3960] "k14" "k14" "k14" "k14" ...
 $ Train: Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
 $ Gender      : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
 $ Item        : chr [1:3960] "Item1" "Item2" "Item3" "Item4" ...
 $ Time        : Factor w/ 3 levels "T1","T2","T3": 1 1 1 1 1 1 1 1 1 1 ...
 $ Age         : num [1:3960] 6.07 6.07 6.07 6.07 6.07 6.07 6.07 6.07 6.07 6.07 ...
 $ Ability     : num [1:3960] 6 6 6 6 6 6 6 6 6 6 ...
 $ Familiarity : Factor w/ 2 levels "N","Y": 2 2 1 2 2 2 2 2 1 2 ...
 $ Practice    : Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 2 1 ...
 $ Bias        : Factor w/ 2 levels "N","Y": 1 1 1 2 1 1 1 2 1 1 ...
 $ Performance : Ord.factor w/ 3 levels "Incorrect"<"Partial"<"Correct": 3 1 ...

I had run an adjacent category model with category specific effects as follows:

prior.test2pl<-prior("normal(0,3)",class="sd",group="ID")+
  prior("normal(0,3)",class="sd",group="Item")+
  prior("normal(0,1)",class="sd",group="Item",dpar="disc")
bf.test2pl.1<-bf(Performance~Train*Gender*Time+Age+Ability+Familiarity+cs(Bias)+(0+Time|ID)+(1|i|Item), +disc~1+(1|i|Item))
bay.test2pl.1<-brm(
  formula=bf.test2pl.1,
  data=testdata,
  family=brmsfamily("acat","logit"),
  prior=prior.test2pl,
)
> parnames(bay.test2pl.1)
  [1] "b_Intercept[1]"                      "b_Intercept[2]"                     
  [3] "b_disc_Intercept"                    "b_TrainY"                    
  [5] "b_GenderM"                           "b_TimeT2"                           
  [7] "b_TimeT3"                            "b_Age"                              
  [9] "b_Ability"                           "b_FamiliarityY"                     
 [11] "b_TrainY:GenderM"             "b_TrainY:TimeT2"             
 [13] "b_TrainY:TimeT3"              "b_GenderM:TimeT2"                   
 [15] "b_GenderM:TimeT3"                    "b_TrainY:GenderM:TimeT2"     
 [17] "b_TrainY:GenderM:TimeT3"      "bcs_BiasY[1]"                       
 [19] "bcs_BiasY[2]"                        "sd_ID__TimeT1"                      
 [21] ...                     

I’ve some questions when conducting post hoc analyses with the model:

Q1: Can I use the hypothesis() function to test whether Familiarity increases the probability of Correct Performance in T2 in the Train group? If so, should I only include the related variables (i.e. TrainY, TimeT2, and FamiliarityY; hpF1), or should I add the intercepts (hpF2), or should I include all variables except TimeT3 (hpF3)? Which is correct (if any) and what’s the difference among them?

hpF<-c("hpF1"="plogis(b_TrainY+b_TimeT2+b_TrainY:TimeT2+b_FamiliarityY)>
       plogis(b_TrainY+b_TimeT2+b_TrainY:TimeT2)",
       "hpF2"="plogis(b_Intercept[1]+b_Intercept[2]+b_TrainY+b_TimeT2+b_TrainY:TimeT2+b_FamiliarityY)>
       plogis(b_Intercept[1]+b_Intercept[2]+b_TrainY+b_TimeT2+b_TrainY:TimeT2)",
       "hpF3"="plogis(b_Intercept[1]+b_Intercept[2]+b_disc_Intercept+b_TrainY+b_GenderM+b_TimeT2+b_Age+b_Ability+b_TrainY:GenderM+b_TrainY:TimeT2+b_GenderM:TimeT2+b_TrainY:GenderM:TimeT2+bcs_BiasY[1]+bcs_BiasY[2]+b_FamiliarityY)>
plogis(b_Intercept[1]+b_Intercept[2]+b_disc_Intercept+b_TrainY+b_GenderM+b_TimeT2+b_Age+b_Ability+b_TrainY:GenderM+b_TrainY:TimeT2+b_GenderM:TimeT2+b_TrainY:GenderM:TimeT2+bcs_BiasY[1]+bcs_BiasY[2])")
print(hypothesis(bay.test2pl.1, hpF, class = NULL),digits=3)
Hypothesis Tests for class :
  Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1       hpF1    0.000     0.000        0        0   1141.857     0.999    *
2       hpF2    0.000     0.000        0        0      0.186     0.157     
3       hpF3    0.005     0.045        0        0      2.548     0.718     

Q2: Can I test whether Bias (which has category specific effects) reduced the probability of Correct rather than Partial Performance? Should I only include Intercept[2] (which represents the threshold between Partial and Correct, hpB1)? Or should I include both Intercept[1] and Intercept[2] (hpB2)? I can do both with the hypothesis() function, but I’m confused about the meaning of these combinations.

hpB<-c("hpB1"="plogis(b_Intercept[2]+bcs_BiasY[2])<
       plogis(b_Intercept[2])",
       "hpB2"="plogis(b_Intercept[1]+b_Intercept[2]+bcs_BiasY[2])<
       plogis(b_Intercept[1]+b_Intercept[2])",
       "hpB3"="plogis(b_Intercept[1]+b_Intercept[2]+b_disc_Intercept+b_TrainY+b_GenderM+b_TimeT2+b_TimeT3+b_Age+b_Ability+b_FamiliarityY+b_TrainY:GenderM+b_TrainY:TimeT2+b_TrainY:TimeT3+b_GenderM:TimeT2+b_GenderM:TimeT3+b_TrainY:GenderM:TimeT2+b_TrainY:GenderM:TimeT3+bcs_BiasY[2])<
plogis(b_Intercept[1]+b_Intercept[2]+b_disc_Intercept+b_TrainY+b_GenderM+b_TimeT2+b_TimeT3+b_Age+b_Ability+b_FamiliarityY+b_TrainY:GenderM+b_TrainY:TimeT2+b_TrainY:TimeT3+b_GenderM:TimeT2+b_GenderM:TimeT3+b_TrainY:GenderM:TimeT2+b_TrainY:GenderM:TimeT3)")
print(hypothesis(bay.test2pl.1, hpB, class = NULL),digits=3)
Hypothesis Tests for class :
  Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1       hpB1   -0.468     0.414   -0.996        0        Inf     1.000    *
2       hpB2   -0.033     0.157   -0.109        0     11.270     0.918     
3       hpB3    0.000     0.002    0.000        0      0.345     0.256     

Q3: How can I test whether the effect of Ability (a continuous variable) differs in the (Train = Y) and (Train = N) groups?

Thanks for your attention.

1 Like

Hi,
sorry for not getting to you earlier, your question is relevant and well written.

First, I think the plogis should not be very useful here: for the purpose of the hypothesis, you should get the same probabilities with nad without the plogis transform (as it maintains ordering). Which implies that the first two hypothesis should be equivalent (you just add the same value to both sides). Since they did not give equivalent results, I assume that either

a) There is some bug in the code
b) brms does something weird with the plogis transform and the code does something else than what I would naively expect
c) There are numerical issues and the values of the b_Intercept[] terms make the plogis overflow/underflow to 1/0 so for a lot of samples the predictions are equal (if that’s the case, you should see a big difference when changing the > in the hypothesis for >=.

In any case, I think you are misunderstanding some of the core aspects of the model. brms is doing a lot of stuff for you and it unfortunately can hide a lot of the important details. It IMHO makes little sense to use plogis to transform the coefficients or their sums (plogis can be used to transform the whole predictor into a probability, but when inspecting only a subset of the coefficients, it is IMHO more reasonable to think about those as in changes in log odds for the adjacent choices (which are themselves weird - I’ve personally never quite understood when the adjacent category model is preferable to cumulative/sequential models). Additionally, since the intercepts correspond to different categories, it IMHO does not make any sense to combine them in a single hypothesis. This holds even more true for the disc predictors - I cannot think of a scenario when a sum of coefficients from the main linear predictors with coefficients from the disc predictor is a useful quantity.

I invite you to (re-)check the tutorial at https://journals.sagepub.com/doi/epub/10.1177/2515245918823199 , especially the appendix as that might clarify some of the issues. Reviewing some basics of linear models (e.g. using resources at Understanding basics of Bayesian statistics and modelling) might also be sensible.

If you are not inclined towards getting deep into the model math, one needs to be extremely careful with doing inferences in such a complex model from the model coefficients alone. A relatively safe approach is to rethink the inference as a prediction task. E.g. for a model response ~ Condition the posterior for b_Condition coefficient corresponds to predictions of difference in means between hypothetical future replications without any noise (or - alternatively with noise, but infinite sample size).

So instead of using hypothesis (which requires you to understand what the coefficients are actually doing in the model) you might want to create a dataset representing a hypothetical scenario you are interested in and use posetior_predict / posterior_epred / posterior_linpred to make appropriate predictions and compare the predictions for different scenarios. There are different ways to setup the dataset for such a hypothetical scenario that would correspond to different scientific questions (e.g. do you predict for subjects with the same covariates that you observed or some new “typical” subject? do you want to include observation noise?).

Similar recommendations apply for Q2

Does that make sense?

Your model assumes that the effect of Ability on the logit scale is the same for both groups (as you don’t model Ability:Train interaction). You could relax this assumption and add the interaction to the model and inspect whether the posterior for the coefficient concentrates around zero. You could also use model comparison techniques the compare the model with an interaction to a model without the interaction term (see e.g. Hypothesis testing, model selection, model comparison - some thoughts for some additional discussion of the options)

Best of luck with your model!