Hello,
I have some hierarchical data where 66 participants in 3 groups each saw 64 trials in four conditions. For each trial, the participants behaviour can be described in one of three categories that are unordered. I would like to model this data using the categorical family in brms. Since I have never worked with categorical models before, I started with modeling some simpler simulated data (no group-level effects, only two conditions and two groups) and am now trying to understand the estimates.
This is the model I ran:
library(tidyverse)
library(brms)
# simulate some data
x = 250
cnd = rep(rep(c("cnd3", "cnd1"), each = 10*x), times = 2) # two conditions
grp = rep(c("A", "B"), each = 10*x*2) # two groups
OUT = c(
# group A
rep("1", 3*x), rep("2", 4.5*x), rep("3", 2.5*x), # cnd3
rep("1", 2*x), rep("2", 5*x), rep("3", 3*x), # cnd1
# group B
rep("1", 4*x), rep("2", 3.5*x), rep("3", 2.5*x), # cnd3
rep("1", 3*x), rep("2", 4*x), rep("3", 3*x) # cnd1
)
df.full = data.frame(OUT, grp, cnd) %>%
mutate(
cnd = as.factor(cnd),
grp = as.factor(grp)
)
# set sum coding for contrasts
contrasts(df.full$cnd) = contr.sum(2)
contrasts(df.full$cnd)
contrasts(df.full$grp) = contr.sum(2)
contrasts(df.full$grp)
# set some priors
priors = c(
prior(normal(0,1), class = "Intercept", dpar = "mu2"),
prior(normal(0,1), class = "Intercept", dpar = "mu3"),
prior(normal(0,1), dpar = "mu2"),
prior(normal(0,1), dpar = "mu3")
)
# compute the model
m = brm(OUT ~ cnd * grp,
data = df.full,
family = "categorical",
backend = "cmdstanr",
prior = priors,
chains = 4)
# check out the estimates for the population-level effects
fixef(m)
# this is my output:
# Estimate Est.Error Q2.5 Q97.5
# mu2_Intercept 0.36919383 0.02450578 0.32250145 0.41762155
# mu3_Intercept -0.06147035 0.02623245 -0.11180210 -0.01045539
# mu2_cnd1 0.23298486 0.02430985 0.18667733 0.27981978
# mu2_grp1 0.29130106 0.02451487 0.24367370 0.33819220
# mu2_cnd1:grp1 0.02258405 0.02403750 -0.02390591 0.07105372
# mu3_cnd1 0.26430609 0.02685252 0.21329722 0.31757810
# mu3_grp1 0.17269689 0.02687473 0.11987718 0.22492792
# mu3_cnd1:grp1 0.02980390 0.02711534 -0.02320310 0.08362795
If I understand correctly, the intercepts describe the log odds ratio between OUT2 / OUT3 and OUT1 which is the reference. So, mu2_Intercept
is similar to log(nrow(df.full %>% filter(OUT == "2"))/nrow(df.full %>% filter(OUT == "1"))) = 0.3483067
.
Then, the condition and group effects refer to the log odds ratio between a certain condition or group for OUT2 / OUT3 and the same in OUT1 minus the respective intercept, so mu2_cnd1
would be similar to log(nrow(df.full %>% filter(OUT == "2" & cnd == "cnd1"))/nrow(df.full %>% filter(OUT == "1" & cnd == "cnd1"))) - log(nrow(df.full %>% filter(OUT == "2"))/nrow(df.full %>% filter(OUT == "1"))) = 0.23948
. [EDIT: Or would I have to subtract both Intercept and interaction?]
Is my understanding correct so far?
What I haven’t been able to figure out is what ratio the interaction effects are estimating, so I would appreciate any information on that as well.
Cheers,
Irene Sophia
- Operating System: Ubuntu 22.04
- brms Version: 2.19.0
- cmdstanr Version: 0.5.3
- cmdstan version: 2.33.1