Interpreting predicted ordinal data in brms

Hi everybody, I am a beginner at brms and bayesian statistics, but I like to dig a bit deeper.
I have an brms model which I trained on 3 continuous predictor variables (named “Metric1”, “Metric2”, “Metric3”) and an ordinal response variable (a Likert scale from 1 to 7, named “LikertRating”). I checked the tutorial on ordinal data (Bürkner and Vuorre, 2019), but I still have some questions. This is the setup:

# Data structure
data <- data.frame(
  Group = factor(Data$GroupID), # 2 levels
  Exercise = factor(Data$ExerciseID), # 3 levels
  Participant = factor(Data$ParticipantID), # 24 levels
  Time = factor(Data$TimeID), # 2 levels
  LikertRating = factor(Data$Rating, ordered = TRUE), # 7 levels
  Metric1 = as.numeric(Data$Metric1), # continuous data
  Metric2 = as.numeric(Data$Metric2), # continuous data
  Metric3 = as.numeric(Data$Metric3) # continuous data
)

# Model formula
ModelFormula1 <- bf(LikertRating ~ Metric1 + Metric2 + Metric3 + (1|Participant:Exercise)) 

# Brms model
Model1 <- brm(
    data = data,
    formula = ModelFormula1,
    family = cumulative("logit")
    )

After training the model, I use the model to predict LikertRatings of unseen data, using only the continuous metrics. The final goal is to assess Group:Time interactions of these unseen data. However, I am a bit lost on how to proceed. Should I sample the ratings, and report median and IQR? Or is there a better way? And how can I assess significance of these predicted Group:Time interactions?

Thanks for your inputs!
Adriaan

  • Operating System: Mac OS Sonoma 14
  • brms Version: version 2.20.4

HI @adriaancampo, I’m a bit unclear of exactly what you mean here. When you say unseen data, are you referring to some type of formalized predictive procedure using actual external validation data? Or summarizing your results from this analysis in terms of predicted responses from specific values of the predictors? Unless I’m missing something the group and time predictors aren’t actually included in your model.

There are several approaches to summarizing effects from these models, either on the latent scale or the manifest ordinal scale, so it depends on what sort of information you’re after.

Hi @AWoodward , thanks for your reply.

I apologize if I was unclear.

So, I trained the model on a dataset containing the metrics and the Likert rating. The model looks as follows: LikertRating ~ Metric1 + Metric2 + Metric3 + (1|Participant:Exercise)

Later, I gathered an additional dataset (let’s call it “NewData”), with a known Group:Time interaction. From this dataset, I have only the metrics, not the Likert rating. I would like to predict the Likert rating (“PredictedRating”) from this additional dataset, and assess the Group:Time interaction of the predicted ratings. I sample a predicted distribution like this:

PredictedRating <- posterior_predict(ModelFormula1, newdata = NewData, re_formula = NULL, ndraws = 4000) %>% as.data.frame()
NewData$PredictedRatings <- t(PredictedRating)

After this step, I am kind of stuck. I would like to assess the Group:Time interaction of the predicted Likert ratings, after filtering for the Group and Time of interest, but I am puzzled how to do this.

Greetings,
Adriaan

So when you say a known group:time interaction, do you mean simply that group and time are observed in ‘Newdata’? In that case you could think of this as a missing data problem where Newdata lacks observations of the ordinal response, and the purpose of the first stage model is to infer those by posterior prediction, given the values of the metric predictors in Newdata.

What I don’t understand is how this is intended to show you anything about the group and time effects. Because the posterior predictions are being generated from model 1 without reference to the group and time effects (model 1 contains no information about them), I believe those predictions wouldn’t contain any information about direct group and time effects. The data might contain information about indirect effects mediated by the metric predictors from model 1; but you could infer that just by making two models, one for Likertrating ~ metrics and one for metrics ~ group*time. If what you’re after is inference regarding something like Likertrating ~ metrics + group*time then I’m not sure how you could get there with this structure.

Have you made some causal diagrams to describe this system?