Modeling and visualizing ordinal data with different response scales

I have a dataset where participants responded to questions that have 2 different response scales: one response scale was 1-5 (see attached data: variable = connection_artist_general and connection_audience_general) and the other was 1-7 (see attached data: variable = connection_artist_circles and connection_audience_circles). I fit a model with brm and I visualized with conditional_effects().

I would like to know if there is any way to separate out the two types of items in the model so that I could make two different plots, one visualizing the conditional effects of the items that had a scale 1-5 and the other visualizing the conditional effects of the items that had a scale 1-7.

If that is not an option, do you think it is reasonable to normalize the 1-5 scale onto a 1-7 scale?

Also, I am under the impression that this does not matter for the brms modelling, but do these response scales matter meaning I should normalize and re-run my models?

# First, read data attached in this post, then: 
data$value<-factor(data$value, ordered= TRUE)
data$Participant<-factor(data$Participant)
data$allowed_vote<-factor(data$allowed_vote, levels = c("0", "1"),labels = c("No", "Yes"))

# selected adjacent category model with category-specific effects because LOOIC was lowest and because it makes sense theoretically
fit_G2<-brm(
  formula = value ~ 1 + allowed_vote + (1|Participant) + (cs(1)|variable), 
  data = data,
  family = acat("probit")
)

conditional_effects(fit_G2, "allowed_vote", categorical = TRUE)
  • Operating System: Windows 10
  • brms Version: 2.13.5

Thank you so much for any and all help!
SampleData.csv (5.6 KB)

2 Likes

I found a possible solution to my plotting issue at this post: Split conditional_effects plot in facets according to the value of the categorical DV

However I am still curious about the effect these different scale ranges may have on the models. Should I be building separate models for the separate response scales?

Additionally, I also realized that the code I provided results in many divergent transitions. Here is the new code that works for me using a cumulative model instead:

prior<-get_prior(formula = value ~ 1 + allowed_vote + (1|Participant) + (1|variable),
data = data,
family = cumulative("probit"))

fit_G1 <- brm(
formula = value ~ 1 + allowed_vote + (1|Participant) + (1|variable),
data = data,
family = cumulative("probit"),
prior = prior,
save_all_pars = TRUE,
control = list(adapt_delta = 0.95)
)

fit_G3<-brm(
  formula = bf(value ~ 1 + allowed_vote + (1|Participant) + (1|variable))+
    lf(disc ~ 0 + allowed_vote, cmc = FALSE),
  data = long_Agency_SC,
  family = cumulative("probit"),
  save_all_pars = TRUE,
  prior = prior_unequalVar,
  control = list(adapt_delta = 0.85)
)
  • Disclaimer: please feel free to correct any incorrect assumptions I am making as I am definitely not an expert in Bayesian statistics.

Thank you so much!

1 Like

Hey Dana! Welcome to the Stan forum! :)

I think you should treat the different response scales differently. brms doesn’t know that for some variables 5 is the maximum and for others its 7. It would thus underestimate the effect of the dependent variable in the “general” group.

In Stan (proper, not brms) you could model the latent “utilities” for all data and apply different (number of) cut point depending on the response variable. I think this is not so hard to code, but I guess going into Stan modeling might be a hurdle.

Looking at your data you could probably also model the data as multivariate (5-scale and 7-scale variable separately in brms. A simple model might be:

f1 <- bf(general ~ 1 + allowed_vote + (1|Participant) + (1|p|artist_audience))
f2 <-  bf(circles ~ 1 + allowed_vote + (1|Participant) + (1|p|artist_audience))

m <- brm(mvbf(f1, f2), data = df2, family = cumulative("probit"), control = list(adapt_delta = 0.99))
m

where the random effect for artist/audience is correlated between outcomes. You can make the model more elaborate, but then you’d need more informative priors.

Also to make this work that data obviously has to be in long format.

library(tidyverse)

data_url <- "https://discourse.mc-stan.org/uploads/short-url/p9UeLdH7wbh8l3Lxt5cHNp1yuf5.csv"
df <- read_csv(data)

df2 <- df %>% 
  separate(variable, into = c("con", "artist_audience", "general_circles")) %>% 
  select(-con, -X1) %>% 
  spread(key=general_circles, value=value)

skimr::skim(df2)

library(brms)

f1 <- bf(general ~ 1 + allowed_vote + (1|Participant) + (1|p|artist_audience))
f2 <-  bf(circles ~ 1 + allowed_vote + (1|Participant) + (1|p|artist_audience))

m <- brm(mvbf(f1, f2), data = df2, family = cumulative("probit"), control = list(adapt_delta = 0.99))
m

Note that without specifying priors this model still results in divergences though…

I hope this does help you with your model. In any case I really think you should avoid lumping those different scales together. Re-scaling would also not really work, because you’d still end up with different numbers of categories per group. You could think about lumping together categories of the 7-scale outcome, for example {[1,2],3,4,5,[6,7]} or {1,2,[3,4,5],6,7} and recode them to 1-5. Then you could probably proceed with your long format model.

Cheers,
Max

4 Likes

I just read the last message, so apologies if my idea does not make sense, but you can assign different threshold vectors (possibly of different length) to different subsets of the data using the thres() addition term. See ?resp_thres for details. Perhaps that is helpful.

5 Likes

Thank you so much @Max_Mantei! Wow you are a lifesaver and this response was so thorough! Sending you so much gratitude!

I found this tutorial here: Bayesian ordinal regression with random effects using brms (kevinstadler.github.io) and he wrote a function that assesses for the best link function for your data using the ordinal package. This function suggested that “probit” was better for the “general” measure and “logit” was better for the “circles” measure. Once I changed the link function to “logit” the model no longer had divergent transitions. I will continue trying to understand how to set informative priors to build better and more elaborate models.

@paul.buerkner Thank you so much for this information! I think as Max pointed out, the way I was organizing my data before would not have allowed for resp_thres to work, but with this new multivariate approach I could use resp_thres! Thank you so much!

3 Likes