Thanks so much for starting this conversation!! As suggested, for simplicity, I took your code and tried contrasting between two groups for the time-being
# Make the rating probabilities
d_weight <- data.frame(ordinal_rating = 1:5) |>
mutate(weight = c(1:3, 2, 1)) |>
mutate(probability = weight / sum(weight))
# Make the rating probabilities for the second group
d_weight_2 <- data.frame(ordinal_rating = 1:5) |>
mutate(weight = c(1, 3, 3, 2, 1)) |>
mutate(probability = weight / sum(weight))
# Define your desired sample size
n <- 100
# Sample ordinal ratings
g1 <- sample(x = 1:5, size = n, replace = TRUE, prob = pull(d_weight, probability))
g2 <- sample(x = 1:5, size = n, replace = TRUE, prob = pull(d_weight_2, probability))
## Now combine this into a single dataframe
combined_df_direct <- tibble(
rating = c(g1, g2),
group = rep(c("Group 1", "Group 2"), each = n)
)
## Let's run the model
m1 <- brm(rating ~ 1 + group,
data = combined_df_direct,
family = cumulative("logit"),
prior = c(prior(normal(0, 1.5), class = "Intercept"),
prior(normal(0, 1), class = "b")),
warmup = 1000,
iter = 5000)
I think I got the hang of this, especially when I make the model spit out predictions using conditional_effects(m1, categorical = TRUE). It’s easy to see here how well the model is recovering the probabilities of each rating. Currently, not such a great job with 100 observations as Option 2 in group 2 should have a higher probability.
However, look at the summary model output.
Family: cumulative
Links: mu = logit; disc = identity
Formula: rating ~ 1 + group
Data: combined_df_direct (Number of observations: 200)
Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
total post-warmup draws = 16000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -2.24 0.27 -2.78 -1.72 1.00 9951 9688
Intercept[2] -0.51 0.19 -0.89 -0.14 1.00 16380 14500
Intercept[3] 0.96 0.20 0.58 1.36 1.00 15945 13463
Intercept[4] 2.55 0.28 2.02 3.11 1.00 15616 13255
groupGroup2 0.25 0.25 -0.23 0.74 1.00 14558 11385
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
As we are simulating by changing the probability of each rating between the two groups manually using the weight dataframes, how does one know the effect size on the log-odds scale a priori before running the model (i.e. the estimate groupGroup2)? After all, it is this contrast that we are interested in estimating, and I don’t have straightforward intuitions of how to glean “one effect size” directly from the weight function itself?
Some additional questions:
- To then extend this workflow to multiple groups (i.e. I am interested in estimating contrasts of the first group with three other groups, hence the indicator variable approach works extremely well), is it okay to just run the simulation for the contrast between two groups and then simply extrapolate that the other groups need similar sample sizes to estimate those contrasts? Or does one need to simulate observations for all the groups?
- How would one update this workflow for a multilevel model? For instance, in my data context, while participants belong to certain groups, they are also clustered in certain sequences. That is, the final model of interest is rating ~ 1 + group + (1| sequence)
BTW, while doing some deep dives on this topic, I found a r package that purportedly claims to simulate likert scale data. See workflow below and this vignette: https://latent2likert.lalovic.io/
## model summary
summary(m1)
conditional_effects(m1, categorical = TRUE)
## Alternative Workflow using latent2likert workflow
## load
library(latent2likert)
set.seed(54543)
## Simulate observations using this package
# Note that I have introduced the skew parameter as I know from previous pilot data that
# people respond in a skewed manner to this item
g1_v2 <- rlikert(size = 100, n_levels = 5, n_items = 1, mean = 0, skew = 0.8)
g2_v2 <- rlikert(size = 100, n_levels = 5, n_items = 1, mean = 0.3, skew = 0.8)
## Now combine this into a single dataframe
combined_df_direct_v2 <- tibble(
rating = c(g1_v2, g2_v2),
group = rep(c("Group 1", "Group 2"), each = n)
)
## Let's run the model again
m1_newdata <- update(
m1,
newdata = combined_df_direct_v2)
## model summary
summary(m1_newdata)
conditional_effects(m1_newdata, categorical = TRUE)
Spits out the following predictions
Thought that I now know the latent effect size (mean = 0.3) using this workflow, but am still struggling with the log-odds model output (i.e. the groupGroup2 coefficient) below:
> summary(m1_newdata)
Family: cumulative
Links: mu = logit; disc = identity
Formula: rating ~ 1 + group
Data: combined_df_direct_v2 (Number of observations: 200)
Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
total post-warmup draws = 16000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -1.86 0.25 -2.37 -1.39 1.00 8975 10576
Intercept[2] -0.15 0.19 -0.52 0.23 1.00 17333 13124
Intercept[3] 0.94 0.20 0.55 1.34 1.00 17303 14307
Intercept[4] 2.50 0.29 1.96 3.08 1.00 16376 13415
groupGroup2 0.15 0.25 -0.32 0.64 1.00 16995 10672
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc 1.00 0.00 1.00 1.00 NA NA NA
What should I transform this estimate to? Certainly not odds-ratio as that isn’t the scale on which I originally defined the effect size?