Ordinal regression model for multiple Likert items using brms

Hi everyone

I want to fit an ordinal regression model (cumulative) to my Likert-scale dataset (multiple items, Big5), using brms. I read a tutorial (SAGE Journals: Your gateway to world-class journal research) by Paul Burkner, in which it says “For this purpose, the data need to be in long format, such that each row gives a single rating and the columns show the values of ratings and identifiers for the participants and items”.

My current dataset looks like this:

In which each question is related to a construct, and each construct has 10 items. So, I have to re-format this, but I’m not sure how the final file should look like.

Should it be like this?:

Person Item Value
101 Withdrawal1 2
101 Withdrawal2 4
101 Withdrawal3 5
. . .
. . .
. . .

If this is true, then I’d have 100 rows.

risk ~ 1 + Withdrawal + (1|person) + (1|item)

(risk is DV)

Then, brms reads from the item column to check for the item and from the person column to check for the person?

1 Like

I want to make super sure that I understand what your final use case is going to be because that will affect your long-data format. As you’ve currently listed the example long data, there’s no variable that would be called “Withdrawal,” so that’s where my confusion comes from in terms of what the end goal is (for that matter, it also doesn’t include your risk DV). You also mention that you would have just a 100 rows, but that’s maybe a typo: you’d have 100 rows per participant, so the final number of rows is 100 times your current number of rows. For what it’s worth, here are some materials for converting wide data to long in R: 15.19 Converting Data from Wide to Long | R Graphics Cookbook, 2nd edition, Converting data between wide and long format, and r - Reshaping data.frame from wide to long format - Stack Overflow.

I think what would be a more typical way to analyze the data would be to compute a sum score of the 10 items composing each construct. This method then reduces your 100 predictors to just 10, which is more manageable. You could then look at those construct scores as both fixed and random effects: e.g., risk ~ 1 + Withdrawal + Compassion + (1 + Withdrawal + Compassion + … | person). Your current approach keeps more information (individual item responses versus sum/average of responses), but it may be hard to interpret the effects of certain constructs when the items are all collapsed together (though see the next paragraph on an item indicator variable). Ultimately, I’d just say to be sure that you think about your research question and what kind of approach fits it best – just wanted to point out an alternative.

To put this data in long-format, I believe that you’ve done the correct thing, but as I noted, there’s no column in the resulting database for Withdrawal (or risk). Looking at the data image you included, that looks like one of the 10 constructs on the scale. If that’s the case, then it seems like you need to include some kind of item indicator variable. You might just make a factor variable called “Construct” that has levels 1:10 corresponding to each of your constructs. When entered into the equation, R will automatically (unless you’ve changed the defaults or requested something different) create dummy variables for this factor such that you get the estimates of Withdrawal as a coefficient relative to some reference level (or some other meaningful contrast if you so desired). If you are just interested in the Withdrawal construct, then you can code the dummy variable yourself (i.e., 1 for every item that is from the Withdrawal construct and 0 otherwise).


I’ve uploaded the file here to make things easier: https://filebin.net/veapclsg3ijqdh7t/1.csv
I have actually summed the scores (you can see in the file) and ran linear regressions in jamovi, but I want to try this new method because Likert data are ordinal, but the analysis I’ve run so far assumed them to be metric, hence this method is more accurate.
RE research question. I have ten personality traits (one of which is Withdrawal), and I want to see whether they predict my dependent variables (also test for interactions), which are obtained from computational modeling of a behavioral task (which are : rho (by “risk” I meant this variable), phi, tau, eta, and lambda). I have done all of this in Jamovi with sum scores, but as I said, I want to implement this more accurate approach.
Btw, thanks for the links.

Just to be clear, what kind of variable is your DV? Is it bounded in anyway? Can it be positive and negative? It sounds like maybe this is multivariate case (i.e., you’re looking at rho, phi, tau, eta, and lambda) and that you’re making it univariate (i.e., looking at just rho). Possibly though, you mean that rho is obtained from some combination of phi, tau, eta, and lambda.

As far as I can tell, the fact that your predictor variables are ordinal does not mean that you need to use ordinal regression. If you were trying to predict the rating of certain items (i.e., your DV is a Likert item), then you’d certainly need to be considering an ordinal regression. To use the same logic, if you were wanting to use sex to predict risk, then you wouldn’t use logistic regression just because sex is a binary variable

1 Like

It is continuous and one of them can be positive and negative. I think in multivariate you have more than one DV simultaneously, right? if this is the case, then no mine isn’t multivariate, because I’m looking at the rho, phi, tau, etc individually.
“the fact that your predictor variables are ordinal does not mean that you need to use ordinal regression”
Really? oh then I was completely mistaken. Do you have anything that I could read more about this?
Then, if this is the case, I could just use the random effects on summed scores.

The referenced ordinal models manuscript by Paul Burkner and Matti Vuorre describe the models as all having ordinal DVs. For example, in the second paragraph of the introduction of the cumulative models, they write: “Our cumulative model assumes that the observed ordinal variable Y , the opinion rating, originates from the categorization of a latent (not observable) continuous variable \tilde{Y}.” As I noted in my other response, if the predictor variables defined what kind of model you chose and you wanted to use sex as a predictor, then that would mean that you’d need to run a binary logistic regression, which intuitively you know would not give you good estimates of your outcome variable. Additionally, if the predictors solely determined the model you had to run, then it’s not clear what you would do in the case of predictors of various types (e.g., using age, sex, and a Likert-scale rating all as predictors in a single model).

For a really good, easily accessible overview of regression modeling, I’d highly recommend Richard McElreath’s Statistical Rethinking. There are other better books on generalized linear models and hierarchical regression modeling (Bayesian or not), but I find that Dr. McElreath’s book in particular is a good balance of being accessible to people without a huge math stats background while still being highly theoretically valuable. Plus, there’s a is bookdown project by Solomon Kurz that recodes Stastical Rethinking into brms (Statistical Rethinking with brms, ggplot2, and the tidyverse).

In short, there are a few ways that you can prove to yourself whether or not an ordinal regression is needed. Perhaps the most generally useful is the posterior predictive check. In model-based statistics, the problem of data analysis is usually best framed as trying to build a model that captures the simplified data generation process. To use your data as an example, you have observed a risk variable of interest and have some other variables that you think are informative/predictive of this risk variable. The model thus uses information about these other variables to predict what the outcome variable’s value is. In Bayesian methods, our models are both inferential and generative because we can use our posteriors to estimate a probabilistic distribution of possible/yet-to-be-observed predictions. The posterior predictive check (run in brms with the pp_check(...) function) plots what your model’s posterior predicts the outcome variable to look like against what your observed outcome variable values are. In a well-functioning model, we want the model’s predictions to look like the raw data.

Obviously, you want to ensure that you have the right predictor variables that help to summarize the data generation process (i.e., the phenomena or proxies for phenomena that happened in the real-world to give rise to the dependent variable(s) that you have observed); however, this is only part of the model’s knowledge of the data generation process. We can include our own knowledge of the data to specify a generalization of the linear model. For example, if the observed variable has only two possible values, then it doesn’t make sense to specify a model where the likelihood of the data is defined by a normal distribution; instead, we can use a Bernoulli distribution (or a binomial distribution with 1 trial) to predict the dichotomous outcome. Similarly, if we have an outcome variable that is the number of successful attempts out of a total number of trials (e.g., number of balls hit in a game, number of stimuli correctly identified, etc), then we probably don’t want to build a model where we assume this data is normally distributed. Telling the model that a normal distribution describes the data would mean that we’re accepting the possibility that an observation of the dependent variable could take any real number despite the fact that we know that our data will always be positive (we can’t have a negative success), will have some upward bound (we don’t get infinite attempts), and may be highly skewed (the task may be very easy and lead to many successes or vice versa). In that case, a binomial distribution is likely the best option for our data. To learn more about what kinds of likelihoods you can specify in brms, you can look up the family argument (by default, family = gaussian() ).

You can run your models under a few different distributions to see how well it performs in predicting the observed data via the pp_check(...) function. For variables that are strictly positive, you may consider a gamma, log-normal, truncated normal/student, or truncated skew-normal distribution (see more about truncating a distribution in brms here: addition-terms: Additional Response Information in brms: Bayesian Regression Models using 'Stan'). If the outcome variable is always an integer and strictly positive, then you might consider a binomial, Poisson, or negative binomial distribution. I found it very helpful to try a lot of different models and examine their posterior predictive check at first and just learn how they worked, what distributional parameters they have, and what kinds of estimation properties they produce. In this case, you can try an ordinal model against a normal model and see how their posterior predictives compare to one another.

To extend the idea of building a model around what our data generation process is, I suggested that perhaps this data is actually multivariate. Ultimately, that’s a choice for you to make regarding the design of your study; however, if your outcome variables all arise from the same process, then there is advantage to fitting a multivariate model, even if you have separate hypotheses about each variable. The brms package offers a couple different options for multivariate regression. You can assume that all the data share the same predictors and likelihood and fit them using mvbind(...) ~ .... Or, you can assume that each outcome variable has its own set of predictors (and possible it’s own unique likelihood family). In this latter case, you can write each outcome variable with it’s own y1 <- bf(y1 ~ ...) and then fit a single model as fit <- brm(y1 + y2 + ...). The advantages of the multivariate model is that it’s more accurate when the same processes give rise to multiple dependent variables (or when multiple dependent variables are all correlated/dependent on one another). Here’s a vignette on multivariate models in brms: Estimating Multivariate Models with brms. Notably, when you estimate the models all together in a multivariate model, you can still do all the inferences that you would normally do with univariate models. In brms, this just requires that you specify the resp = ... argument in a lot of the post-hoc convenience functions (e.g., conditional_effects(..., resp = "rho") ).

I suspect that ultimately the reason that you wanted to consider an ordinal model for your Likert predictors is that that kind of data is qualitatively and clearly different in nature than other predictors, so it just feels like there’s something extra we should do with it. This is very true, and there are several ways to consider this problem. In R, you can specify the Likert items as ordered factors (e.g., ordered(..., levels = c(...)) ), and then you can use a variety of different effect/contrast coding options to analyze the data. Alternatively, you could try to treat them as continuous and just enter them as numeric variables into the equation. The brms package offers a special function to use around ordinal predictors in the formula to account for their ordinal nature (see the vignette here: Estimating Monotonic Effects with brms).

In short, it doesn’t sound like you need to use the ordinal regression model since none of our outcome variables are ordinal, and it sounds like perhaps there is no need to worry about the ordinality of your predictors anyway since you can sum them together. Still, I encourage you to think about the data generation process and what you can build into your model through the likelihood statements, predictors, and brms formula(s) to help reflect your knowledge of the variables and processes that got you the data you’ve observed


Thanks for the long post and your patience in explaining! read it a few times.
What about IRT? I’m new to IRT and I like how it takes into account the fact that every respondent perceives the distance between options, such as “agree” and “strongly agree” differently, offering a more accurate estimation. This analysis is for a paper I’m writing but given that grad school is right around the corner, I think I’m gonna go with simpler analysis to get it done and show the pre-print to potential supervisors. But I do plan to fit IRT models to my data or go down the generative modeling route.

I have actually read the book and love it, but honestly, I didn’t read it deeply and skipped the exercises, which, in the hindsight, was a mistake. So I should read it again. Stats is easiest on the surface but in practice it gets tricky!

That was a simple explanation of generative modeling. I have always had difficulty understanding the intuition behind it. And thanks for the links!

Let me ask an unrelated question. Some of my variables are not normal (skewed to the right), so I log-transformed them, but the issue persists, what do you think is the problem?

IRT is a great option for converting your Likert scale items to a well-behaved continuous latent variable. I do a lot of IRT modeling in Stan, and there are a couple useful papers on how to do this in brms: https://arxiv.org/pdf/1905.09501.pdf and Analysing Standard Progressive Matrices (SPM-LS) with Bayesian Item Response Models - PubMed. The downside of IRT is that you usually need larger sample sizes. Sample size recommendations vary by the type of model (e.g., Rasch, 2PL, 3PL, etc), data type (e.g., binary, ordinal, nominal), test length, and dimensionality (i.e., unidimensional versus multidimensional models). Generally speaking, simulation studies typically see that things level out around 1000-1500 participants, but depending on other considerations, the sample can be smaller. If I had to guess, you can get away with some smaller sample sizes in IRT with Bayesian methods, if you’re OK with some more informative priors. Additionally, I find that IRT models estimated in the generalized linear mixed model (like those linked earlier) also tend to behave better with smaller sample sizes.

Notably, a Rasch model (or a general one-parameter model where only item “difficulties” vary) will produce latent trait/ability measures for your respondents that are correlated very strongly (often nearly perfectly) with your sum scores. Even if you go with Rasch model, it still makes sense to estimate this over a sum score because you get the full posterior uncertainty of the latent trait rather than a simple sum of ratings.

For what it’s worth, IRT models are a generative model type. You’re thinking about how to create your observed item scores by estimating certain item properties (e.g., difficulty, discrimination, etc). Model selection is then based on what item properties would produce your observed scores (e.g., a guessing parameter is probably not needed in free-response items but may be really useful in multiple choice questions). In fact, even in frequentist IRT methods, you can simulate item response data from the fitted IRT model. These basic IRT models can then be extended with item or person covariates, multiple dimensions, and more to account for other processes that underlie the item response, but that requires thinking about what additional things you can tell the model to more accurately generate meaningful predictions.

1 Like

I see, thanks, you’ve been quite helpful and know how to explain things in a simple manner. I’ll bookmark this page to come back to later!