Cumulative models for multiple likert items

Thanks to a recent tutorial from @paul.buerkner and Matti Vuorre, I’m realizing that it should be possible to fit ordinal models in brms for an analysis of multiple questions per informant that I’m assisting with (similar to the discussion on page 19/20 of the tutorial).

In essence, participants are asked to respond to a series of questions about funding priorities for 10 budget items. So for question q, person i provides a response on a 5-point likert scale. We hypothesize that these responses vary as a function of an i-level attribute denoted x.

Furthermore, the funding priorities can be broken down into two categories: military and educational. So the dataset looks roughly like this (showing responses for 2 of the n informants).

q y x i type
1 4 1 A Military
2 5 1 A Education
3 2 1 A Military
4 3 1 A Education
5 1 1 A Military
6 5 1 A Education
7 3 1 A Military
8 5 1 A Education
9 3 1 A Military
10 5 1 A Education
1 2 0 B Military
2 5 0 B Education
3 5 0 B Military
4 2 0 B Education
5 3 0 B Military
6 1 0 B Education
7 4 0 B Military
8 5 0 B Education
9 3 0 B Military
10 2 0 B Education

Now imagine that the researcher has two questions:

  1. The attribute, x, predicts the emphasis that the individual gives to the different types of funding.
  2. Relatedly, within individuals, people who rate educational items more highly will rate military items lower even after accounting for variation explained by x.

The first question seems like it can be profitably investigated with a brm model:
brm ( y ~ x * type + (1|i) + (1|q), family = cumulative("probit")

To examine the second question, my intuition is to consider a random coefficients model:
brm ( y ~ x * type + (1 + type|i) + (1|q), family = cumulative("probit")
And then look at the covariance of the i-level intercept and coefficients for insight into the correlation across those two types of questions.

Are there other modeling strategies that I should be considering, though?

2 Likes

Based on your explanations, the models you are considering look reasonable to me.

1 Like

So multiple items works great using the multi-level method! But, is it possible to go multivariate too? 1. To estimate correlations between scales; and 2. for different sized scales?

This is approaching CFA/SEM (which I understand is in the brms pipeline) - but I’ve had a crack and can estimate two equal-sized scales and get the same/similar parameter estimates as the individual models - e.g :

ID ITEM B1 B2 FACTOR1 FACTOR2
1 1 0 1 7 2
1 2 0 1 3 5
1 3 0 1 6 7
1 4 0 1 3 4
2 1 1 0 6 2
2 2 1 0 2 5
2 3 1 0 4 4
2 4 1 0 3 2
modMV <- brm(mvbind(FACTOR1, FACTOR2) ~ 1 + B1*B2 + (1 | ID) + (1 | ITEM),
 data = dat, family = cumulative("probit")

The complication will be including correlations (rescor defaults to FALSE and breaks when set to TRUE), and imputing missing data if the scales are different sizes (mi() doesn’t support cumulative family). Is this theoretically possible by tweaking the Stan code?

Thanks!
Oli

I am sorry, I don’t fully understand your question. Could you perhaps try to explain it once more?

Apologies Paul, and thanks for asking for clarification.

I was wondering whether it is (or will be possible) to run multivariate ordinal analyses.

  1. Say for instance I have three factors, each with 5 likert items. Is it possible to estimate each factor, and the correlations between the factors?

This is possible for gaussian models in brms using MVBind and rescor = TRUE, but not for cumulative probit models. I was wondering whether this can be done by altering the stan code.

  1. If it is, then is it possible when I have three factors with 5,7, and 9 items? Again, I think this is possible using mi() for gaussian but not cumulative probit models.

I hope that makes sense!

Oli

  1. Estimating residual correlations for ordinal models requires the CDF of the multivariate normal (of multivariate logistic) distribution, which is currently not available in Stan. As soon as we have that, I will try to make multivariate ordinal models with residual correlations available in brms. You may still model varying effects as correlated across univariate models.

  2. Not sure what mi() has to do with different number of categories, but in general, modeling ordinal responses with different number of categories (is that what you mean by “items”?) within the same multivariate model poses no problem as each response just gets its own set of thresholds.

Excellent! Thanks for letting me know!

As for #2 - by items I mean the observations that make up the factor.

One factor might be made up for 8 individual test items (each with 7 likert categories), whereas another factor might have 5 items (with the same number of likert categories).

When this is structured in long format (one column for each factor), there are 3 NA’s in F2 for each participant. These could be treated as missing data under metric models, but not ordinal ones.

Best wishes,

Oli

Yeah, I see the problem. For that purpose you could use the subset addition argument of brms.
See ?resp_subset.

Awesome! Thanks! I’ll give that a shot. It means I can now just run one model for my measure, rather than four!

@Oli_J, would you mind sharing your resultant model syntax for these multivariate ordinal models (with different item numbers for each response measure)? I have a similar modeling challenge and am unsure how to utilise the resp_subset argument. Cheers!

Hello! I would if I could, but I couldn’t get it working :-( I’m going to be revisiting my analysis though, and if I am successful I will of course post it one here!

Thanks for the update, and good luck. I’ll post here as well if I get it working.

Hi,
Can anyone illustrate how a BRMS model would be fit if the goal is simply to measure the correlation between two Likert variables? Assume they are both 1-7, for example. I tried using mo() on the dependent variable, but that resulted in an error. Only the independent variable could be passed this way.

Thank you.

Perhaps this blog post may help you? The model formula used in the discussed examples is: mvbind(x, y) ~ 1, where the residual correlation between x and y is the parameter of interest. That said, the example uses continuous data, and the brms multivariate tutorial states estimating residual correlations is only possible with a “multivariate normal (or student-t) model”.

2 Likes

This does unfortunatley not work with cumulative models as we don’t have a CDF function of the multivariate normal distribution in Stan yet, which would be relevant to use as the latent variable distribution. In other words, this is not yet possible in brms but may be possible in the future.

2 Likes

Thank you ian.macdonald and paul.buerkner.

Actually, the next best thing would be to fit:

brm(Likert1 ~ mo(Likert2) + (1|Var), data=f, family=cumulative(link = “logit”, threshold=“flexible”))

Can’t I take the independent variable’s log-odds coefficient to be an approximation of how the two likert variables are correlated?

I dont know about the properties of such an estimate so I would be careful recommending it.

I understand. Likert1 consists of satisfaction ratings (1 = very unsatisfied, 7 = very satisfied) and Likert2 consists of recommendation ratings (1 = very unlikely to recommend product to others, 7 = very likely to recommend product to others). Var corresponds to cost, in dollars.

brm(Likert1 ~ mo(Likert2) + (1|Var), data=f, family=cumulative(link = “logit”, threshold=“flexible”))

I just want to understand this kind of model better, I guess. Any suggestions and help welcome.

Thanks but that’s not what I mean.

I just dont know what is implied by using the mo prexictor with the cumulative family and what properties the related coefficient has in relation to what you want. So I would not use this for the purpose you have in mind.

2 Likes