New to BRMS, really need help modelling ordinal data


I am currently trying to apply an ordinal model to my outcome variable: sentiment score (positive, negative, neutral).

I want to look whether there is a relationship between the way a question is asked (positive, negative, neutral wording) and the sentiment of the response. I have 2638 people asked a question about symptoms. 1/3 of the people were asked it with a negative wording, 1/3 with a neutral one, 1/3 with a positive one. From this, I did sentiment analysis (using Trincker’s package) to see whether their responses were more positive or negative, depending on the wording of the question.

Sentiment analysis breaks down responses into sentences, so I have 2638 people, but 7924 sentences, so I would assume to fit ID as a random effect.

The big question is: does the way the question is asked (primetype) affect the sentiment of the response?

Here is a subset of my data:

 structure(list(agequartiles = structure(c(1L, 3L, 2L, 1L, 2L, 
4L, 3L, 1L, 3L, 4L, 1L, 2L, 2L, 2L, 4L, 1L, 3L, 3L, 4L, 4L, 4L, 
3L, 4L, 1L, 4L, 3L, 1L, 4L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 3L, 2L, 
2L, 3L, 4L, 4L, 3L, 2L, 3L, NA, 1L, 1L, 1L, 2L, 2L), .Label = c("[18,23]", 
"(23,27]", "(27,32]", "(32,54]"), class = "factor"), sentiment = c(1, 
1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 
1, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 3, 2, 1, 
1, 2, 1, 1, 3, 1, 3), group = structure(c(2L, 3L, 3L, 2L, 2L, 
1L, 2L, 1L, 2L, 2L, 2L, 3L, 3L, 1L, 3L, 1L, 3L, 2L, 2L, 1L, 3L, 
1L, 3L, 2L, 1L, 2L, 2L, 2L, 3L, 1L, 1L, 2L, 1L, 3L, 1L, 2L, 3L, 
3L, 3L, 3L, 2L, 3L, 3L, 1L, 3L, 3L, 3L, 3L, 3L, 2L), .Label = c("prime1", 
"prime2", "prime3"), class = "factor"), continent = c("UK", "Australia and New Zealand", 
"Northern America", "UK", "Northern America", "Australia and New Zealand", 
"Asia and the Pacific", "UK", "Southern and Central America", 
"Australia and New Zealand", "UK", "Northern America", "Northern America", 
"UK", "Northern America", "UK", "UK", "Northern America", "UK", 
"Northern America", "Northern America", "Southern and Central America", 
"Northern America", "UK", "Europe", "Northern America", "UK", 
"Northern America", NA, "UK", "UK", "Australia and New Zealand", 
"Australia and New Zealand", "UK", "UK", "UK", "Australia and New Zealand", 
"Northern America", "UK", "Northern America", "UK", "Asia and the Pacific", 
"Northern America", "Northern America", NA, NA, "UK", "Europe", 
"UK", "Northern America"), ID = 1:50, medication = c("FALSE", 
"FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "TRUE")), row.names = c(NA, 
50L), class = "data.frame")

this is my workflow:

  1. I chose a link function using this tutorial The link chosen was cloglog.
  2. Chose a model using this tutorial tutorial and chose Acat with category specific effects.
  3. Imputed data
  4. Ran models with different predictors
  5. chose model

I imputed missing data using the missRanger package (mice wouldn’t work)

data <- lapply(3456:3460, function(x)
     . #predict all columns 
    ~ . #Make predictions using all columns except:
    - ID,
    maxiter = 10,# How many iterations until it stops? 
    pmm.k = 3, #Predictive Mean Matching leading to more natural imputations and improved distributional properties of the resulting values
    verbose = 1,#how much info is printed to screen, 
    seed = x,#Integer seed to initialize the random generator.
    num.trees = 200,
    returnOOB = TRUE,
    case.weights = NULL

Models I ran:

models_group <- brm_multiple(formula = sentiment  ~ 1 + cs(group),  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

models_meds <- brm_multiple(formula = sentiment  ~ 1 + cs(group)+ medication,  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

models_age <- brm_multiple(formula = sentiment  ~ 1 + cs(group)+age,  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

models_continent <- brm_multiple(formula = sentiment  ~ 1 + cs(group)+continent,  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

models_all<-models_age <- brm_multiple(formula = sentiment  ~ 1 + cs(group) +age +medication+continent,  data = data, family = acat("cloglog"), combine=TRUE, chains=4)

then I tried to use LOO to see which model worked best

my questions:

  1. is that a reasonable work flow?
  2. I still don’t understand priors, I have read tutorials, and posts on here but I still don’t understand, or how to work out what to set mine as?
  3. In my data, the models that include age don’t work, they converge and Rhat is more than 1.1, should I do more chains?

thank you for reading

Quick question related to your 3rd question: when you’re looking at the Rhats, are you basing that on just the output of the generic summary() or are you checking the individual Rhats (e.g., models_age$rhats)? The generic warning that Rhat is too large is common in brm_multiple, but it’s just as often a false positive (see here).

If your Rhats are still problematic, then I’d guess that better priors could help. I’m not sure exactly what kinds of questions you have about priors. If you’re concerned that your selection of priors may bias your results, then perform sensitivity analyses afterward where you specify several different priors and see how much the final results differ. If you don’t have strong prior beliefs, then I generally find that normal and student t distribution with large (on the scale of the variables) are easiest to defend and conceptualize.

I’d encourage you to include prior predictive checks into your workflow, particularly if you’re concerned about priors in the first place. I usually do something like this to get a rough idea of what my prior specifications are doing:

priors_fit <- brm_multiple(... sample_prior = "only")
pp_check(priors_fit, type = "bars")

You may play around with other pp_check(... type = xyz) options to get a sense of what your priors are saying about your beliefs before looking at the data.

Thank you for this, its really helpful. My problem with priors is that I don’t actually understand what they do. I know its important to define them, but I don’t understand them in simple terms-whether I define them for all variables, or some, and what it actually does. I’ve read a lot of papers and tutorials, but the language is all as if the reader already understands them, I’m struggling to find something at the introductory level that can explain simply.

For example, I tried it with a single model (the multiples take so long to run)

  formula = sentiment  ~ 1 + cs(group),  data = iddf, family = acat("cloglog"),
  autocor = NULL,
  data2 = NULL,
  knots = NULL,
  sparse = NULL,


This gives me:

prior     class        coef group resp dpar nlpar bound       source
   (flat)         b                                                                       default
   (flat)         b groupprime2                                                 (vectorized)
   (flat)         b groupprime3                                                 (vectorized)

 student_t(3, 0, 2.5) Intercept                                               default
 student_t(3, 0, 2.5) Intercept           1                             (vectorized)
 student_t(3, 0, 2.5) Intercept           2                             (vectorized)

but I can’t work out what it means, and what to do with this information.

So, looking at the output of get_prior() is a good place to start, but it can definitely be a little confusing at first. My general recommendation would be to start with some simpler models and your standard regressions. I personally find that the best way of building an intuition for what a prior will do is by having experience specifying them in models that you really understand. I would also warn that a lot of times, the priors aren’t too important. They can certainly impact the efficiency of your model, but with large samples, the priors have little influence on the final results.

The general logic of prior is that you’re specifying what you think the parameter could be before you look at your data. So, for the regression coefficient priors that you found above (i.e., class = b), the default specification is a flat prior. This essentially what standard, non-Bayesian linear regression is assuming: any value for the regression coefficient from negative to positive infinity is equally probable. That is very rarely a reasonable prior assumption, so we can typically improve that some. Assuming that all your predictors are on roughly similar scales, you might be able to think of certain ranges of values that you’d be very surprised to see in the final result. For example, if we specify a prior like this prior(normal(0, 10), class = b), then we’re saying that our belief about what the credible parameter values for our regression coefficients can be described as a normal distribution centered at 0 and with a standard deviation of 10. This prior can be considered skeptical (i.e., we are putting the greatest probability on the parameter being 0), and depending on the scale of our predictors, this could be considered a weakly informative prior (i.e., we give larger probability to smaller effect sizes, but we don’t rule out that some effects may be particularly large).

You don’t have to specify priors for every parameter (e.g., a unique prior for each predictor), but you might need to if your variables are all on very different scales. For example, specifying a normal prior with mean of 0 and standard deviation of 10 may be perfectly fine for some variables, but it would be really bad for variables like IQ (which are defined in the population as having a mean of 100 and standard deviation of 10). Arguably, though, you might not need to change the priors’ scale too much since the prior is on the regression coefficient, not the distribution of the predictor. Where I usually find that I need to most tweak the priors is in the intercepts. You might look at what rstanarm does for its default priors since that package does more custom tweaking.

So, to return to your print out from get_prior(), the left hand column is telling you what distribution your prior is with the “class” column saying what parameter type that prior is for. In your case, you have also defined certain groups since you used cs(), so you have priors for the regression coefficients specific to each category. Then, you have 3 intercept priors (class = Intercept), one is a general intercept for the linear components and then the other two correspond to your thresholds for the ordinal component of your outcome variable.

Again, my recommendation for wrapping your head around what these things means is to take a toy example of simple linear regression and try lots of different prior specifications. I strongly encourage using the prior predictive checks to visually see what you’re specifying with each of the priors. Then, I encourage you to fit the model each time and see how much influence on the final results each prior specification has


thank you so much for this comprehensive answer, its been incredibly helpful!!!

1 Like