Cumulative models for likert scale (compare groups)

Dear all,

we have data in which participants were asked to rate on a 7-point likert scale the intensity of a question. The question was repeated under two different experimental conditions (Cond), and two different groups participated to the experiment (Group). We are interested in assessing whether the rating is modulated between the two groups, conditions, and if there is an interaction between the two.

Based on the (excellent) tutorial from @paul.buerkner and Matti Vuorre, we realized that we should analyze those data with brms, using a cumulative model. We use the following model:

brm(Rating~Group*Cond+(1|participant), data = data,family=cumulative())

summary() and hypothesis() provide us with several information, but we would need some advice on how to recover the Estimate and Evid.Ratio for the “main effect” of Cond and the interaction for the cumulative model.

Thank you very much in advance!
Fosco

  • Operating System: Ubuntu 18.04
  • brms Version: 0.7

Hi welcome to the Stan community!

What do you mean by “recover” in this context? And you should probably update brms to the latest CRAN version (2.8.0).

Hi,

by recover I meant how can I have access to those information (I am using brms 2.7.0, will update to the new version).

Thanks

If you mean the posterior samples of the parameters, you can extract them via posterior_samples.

@paul.buerkner I’ve seen these questions pop up every now and then here on discourse. I think the main issue here is that many people come from a freq background and then suddenly they have a posterior to play with (I was a bit confused when I started to say the least). The summary() output from brms, rethinking, rstanarm et al. is quite alright, but it is a summary.

Very many people don’t see the benefits of having a posterior all of a sudden. I think a tutorial would be good, where one can see many examples of the type of questions one can answer when we have a posterior (finding means/medians/HPDIs etc, comparing 2 or >2 things, provide credible intervals to various estimates). I’ll try to write up a tutorial later if I can’t find one already written, which will focus only on how to ask questions to the posterior.

@FoscoB, at many times @bgoodri wraps up his rstanarm tutorials by asking questions to the posterior and @jonah has some examples when doing graphical posterior predictive checks in particular.

3 Likes

Thanks for your help and suggestions!

Hi @FoscoB,

The detailed answers to your questions depend on how you have coded your predictors. For example, you may have coded condition and group with dummy variables (0 = group a, 1 = group b), or you may have used effect coding (e.g. -1 = group a, 1 = group b).

Let’s assume that you have coded group and condition with dummy variables (this is what R modeling functions will do unless you specify otherwise.) Here is some example data in variable dat:

> str(dat)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	557 obs. of  3 variables:
 $ Rating: num  4 4 4 4 4 4 4 4 4 4 ...
 $ Group : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ Cond  : Factor w/ 2 levels "0","1": 1 1 2 2 1 1 2 1 2 2 ...

I then estimated the model as specified in your example code (varying intercepts for participants are irrelevant for us now, so I don’t include them):

> fit <- brm(
+   Rating ~ Group * Cond, 
+   data = dat,
+   family = cumulative()
+ )
> summary(fit)
Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept[1]    -1.52      0.18    -1.88    -1.18       3156 1.00
Intercept[2]    -0.44      0.15    -0.73    -0.14       3554 1.00
Intercept[3]     1.46      0.17     1.13     1.79       2950 1.00
Group1           0.95      0.22     0.52     1.41       2581 1.00
Cond1            0.35      0.22    -0.10     0.79       2667 1.00
Group1:Cond1    -0.06      0.32    -0.69     0.56       2493 1.00

The estimated effects are interpreted as in any regression model where you have dummy coded group and condition. Group1 is the effect on the mean of the latent variable (from which the ordinal ratings are assumed to originate) when Cond = 0. Cond1 is the effect of condition when Group = 0.

The interaction Group1:Cond1 tells you how much the effect of group changes between conditions, or equivalently how much the effect of condition changes between groups. To get the effect of condition for both groups separately, you can use the hypothesis method.

> Hypotheses <- c(
+   "Cond-group0" = "Cond1 + Group1:Cond1*0 = 0",
+   "Cond-group1" = "Cond1 + Group1:Cond1*1 = 0"
+ )
> hypothesis(fit, Hypotheses)
Hypothesis Tests for class b:
   Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 Cond-group0     0.35      0.22    -0.10     0.79         NA        NA     
2 Cond-group1     0.29      0.22    -0.15     0.71         NA        NA     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

(Notice how the first row corresponds to the basic model output.) You can also specify the hypotheses with > 0 or < 0 to get evidence ratios.

Then, to specifically answer your questions: The estimate and evidence ratio (that effect is positive) for the interaction is given by:

> hypothesis(fit, "Group1:Cond1 > 0")
Hypothesis Tests for class b:
          Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (Group1:Cond1) > 0    -0.06      0.32    -0.59     0.45       0.77      0.44 

(I would not over-interpret the evidence ratio or posterior probability numbers.)

The main effect of Cond is typically taken to mean the effect of Condition “for the average group”. To get that effect, you need to estimate the model using effect coded Group variable, and then the summary of Cond1 would be the “main effect” (see ?contr.sum). You can also get a main effect from the model with dummy coded variables by asking for the effect of Cond when Group1:Cond1 = 0.5

> hypothesis(fit, "Cond1 + Group1:Cond1*0.5 = 0")
Hypothesis Tests for class b:
                Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (Cond1+Group1:Con... = 0     0.32      0.15     0.02     0.62         NA        NA    *

More generally, as others pointed out, you can think of any contrast or transformation of the parameters, and using the posterior samples, you can get the posterior of that contrast / transformation. With the hypothesis() method, all that’s left for you to do is to write out the equation for the contrast.

Hope that helps.

4 Likes

Hi @matti,

thanks a lot for your very clear explanation! This solves a lot questions…

thanks,
Fosco

Would someone kindly link to the tutorial?

I assume this is the one being referred to.

1 Like

Another option is to calculate sum-scores (rating * probability of rating) for each observation:iteration and then compute contrasts.

This approach also gives you effect sizes on the response scale.

Reviewers have been amenable to this approach in my experience

1 Like

Hi JLC, could you please post an example of this approach / reference article?

You can find an example here, particularly Section 11.1.4.

2 Likes

Here’s an example of how you could approach sum-score contrasts

# Required packages
Pkgs <- c("tidyverse"
          , "tidybayes"
          , "brms"
          , "parallel" )

## Load packages
lapply(Pkgs, require, c = T)

## Set computing options
options(mc.cores = detectCores())
ncores = detectCores()

# Make data

## Ordinal function
get_probs <- function(eta,cuts){
        p = rep(NA,(length(cuts)+1))
        p[1] = 1-plogis(eta-cuts[1])
        for(i in 2:length(cuts)){
                p[i] = plogis(eta-cuts[i-1]) - plogis(eta-cuts[i])
        }
        p[length(p)] = plogis(eta-cuts[i])
        plot = ggplot(
                data = data.frame(x=1:length(p),y=p)
                , mapping = aes(x=x,y=y)
        ) +
                geom_bar(stat='identity',position='identity')+
                scale_y_continuous(limits=c(0,1),expand=c(0,0))
        
        print(plot)
        return(p)
}

## Create data for two groups
p1 <- get_probs(-1, seq(-3, 3, length.out = 6))
p2 <- get_probs(1, seq(-3, 3, length.out = 6))

## Make a data frame
set.seed(1)
data <- data.frame(
        sex = rep(c('male','female'), each = 30)
        , response = c(
                sample(1:7,30, prob = p1, replace = TRUE)
                , sample(1:7,30, prob = p2, replace = TRUE)
        )
) %>%
        mutate(sex = factor(sex)) 



## Look at the simulated data
data %>%
        ggplot(aes(x = response)) +
        geom_histogram(aes(fill = sex)
                       , bins = 7
                       , position = position_dodge(width = 1)) +
        scale_x_continuous(breaks = 1:7) +
        theme_classic()

# Model

## Formula
fmla <- bf(response ~ sex)

## Priors
priors <- c(set_prior("normal(0,5)"
                      , class = "Intercept"
                      )
        
        , set_prior("normal(0,1)"
                  , class = "b"
                  )
)

## Fit the model
fit <- brm(fmla 
           , data
           , family = cumulative()
           , prior = priors
           , iter = 2000
           , warmup = 1000
           , chains = 4
           , cores = ncores/2
           , backend = "cmdstanr"
           , threads = threading(2)
           , normalize = FALSE
           , save_pars = save_pars(all = TRUE)
           , stan_model_args=list(stanc_options = list("O1"))
           , control = list(adapt_delta = 0.99
                            , max_treedepth = 14)
)


# Analysis

## Posterior predictive check
fit$data %>% 
        add_epred_draws(fit
                        , ndraws = 100) %>% 
        group_by(sex, .category, .draw) %>% 
        summarise(count = sum(.epred)) %>% 
        group_by(sex, .category) %>% 
        mean_qi(count) %>%  
        rename(response = .category) %>% 
        
        ggplot(aes(x = as.numeric(response))) +
        geom_bar(data = data,
                 fill = "grey70") +
        facet_wrap(~sex) +
        geom_pointrange(aes(y = count, ymin = .lower, ymax = .upper),
                        size = 1/6) 

## Get sum-score intervals
intervals <- data %>% 
        add_epred_draws(fit, re_formula = NULL, ndraws = 100) %>% 
        ungroup() %>%
        mutate(product = as.double(.category) * .epred
        ) %>% 
        group_by(sex, .row, .draw) %>% 
        summarise(sum_score = sum(product)) %>% 
        ungroup() %>%
        group_by(sex) %>% 
        point_interval(
                 sum_score
                , .width = 0.89
                , .point = mean
                , .interval = hdci
                , .simple_names = TRUE
                , na.rm = TRUE
                , .exclude = c(".width", ".point", ".interval")
        ) %>%
        ungroup() 

## Plot sum-score intervals
intervals %>%
        select(-.width, -.point, -.interval) %>%
        ggplot(aes(sex, sum_score)) +
        geom_point(size = 3) +
        geom_errorbar(aes(ymin = .lower, ymax = .upper),
                      width = 0.1) +
        scale_x_discrete("Sex") +
        scale_y_continuous("Rating (Sum-Score)", 
                           limits = c(1,7), 
                           breaks = c(1:7)) +
        theme_classic()

## Calculate sum-score difference
contrast <- data %>% 
        add_epred_draws(fit, re_formula = NULL, ndraws = 100) %>% 
        ungroup() %>%
        mutate(product = as.double(.category) * .epred
        ) %>% 
        group_by(sex, .row, .draw) %>% 
        summarise(sum_score = sum(product)) %>% 
        ungroup() %>%
        select(sex, sum_score, .draw) %>%
        dplyr::group_by(sex) %>%
        mutate(row = row_number()) %>%
        group_by(row, .draw) %>%
        compare_levels(sum_score
                       , by = "sex") %>%
        ungroup() %>%
        group_by(sex) %>%
        point_interval(
                sum_score
                , .width = 0.89
                , .point = mean
                , .interval = hdci
                , .simple_names = TRUE
                , na.rm = TRUE
                , .exclude = c(".width", ".point", ".interval")
        ) %>%
        ungroup() 

## Plot sum-score difference
contrast %>%
        select(-.width, -.point, -.interval) %>%
        ggplot(aes(sex, sum_score)) +
        geom_point(size = 3) +
        geom_errorbar(aes(ymin = .lower, ymax = .upper),
                      width = 0.1) +
        scale_x_discrete("Male - Female") +
        scale_y_continuous("Rating (Sum-Score) Difference", 
                           limits = c(-3,3), 
                           breaks = c(-3:3)) +
        geom_hline(yintercept = 0, linetype = 2) +
        theme_classic()


1 Like

Thanks!