Hi,
I am working with some cumulative family ordinal data, and I have some questions on interpreting some values for thresholds and group-level sd.
My data contains a scaled value (n= 96, mean = 0, sd = 1), an ordinal category (n = 4), and a grouping variable (species in this case, n = 24). See end of the post for data simulation code.
testdf
# A tibble: 96 x 3
val species cat
<dbl> <chr> <int>
1 -4.02 a 1
2 -2.93 a 1
3 -2.23 a 1
4 -2.43 a 1
5 0.225 b 1
6 -0.0597 b 1
7 -0.0194 b 1
8 0.0591 b 1
9 -0.103 c 1
10 -0.545 c 1
# … with 86 more rows
Estimating a simple ordinal regression works fine. It’s not a great fit to the data, and there isn’t a lot of separation between some categories, but all the values are reasonable and make sense given that the predictor is scaled to N(0,1).
tm.1 <- brm(
cat ~ val, data = testdf,
family = cumulative(link = "probit"),
chains = 2, cores = 2,
control = list(adapt_delta = 0.99, max_treedepth = 15)
)
summary(tm.1)
# summary cropped for space
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -1.04 0.16 -1.37 -0.73 1.00 2255 2505
Intercept[2] -0.46 0.14 -0.72 -0.18 1.00 3618 2899
Intercept[3] -0.00 0.14 -0.27 0.27 1.00 4876 2415
val 0.20 0.10 0.00 0.39 1.00 3127 2489
When I add in the species grouping variable as a group level effect, everything changes.
tm.2 <- brm(
cat ~ val + (1 | species), data = testdf,
family = cumulative(link = "probit"),
chains = 2, cores = 2,
control = list(adapt_delta = 0.99, max_treedepth = 15)
)
summary(tm.2)
# summary cropped for space
Group-Level Effects:
~species (Number of levels: 24)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 16.61 7.47 7.53 36.54 1.00 1142 1343
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1] -12.95 5.87 -27.55 -4.93 1.00 968 1331
Intercept[2] -3.31 2.84 -9.57 1.62 1.00 849 1661
Intercept[3] 3.31 2.91 -1.40 10.26 1.00 710 1499
val 0.16 0.77 -1.32 1.72 1.00 1977 1694
This clearly isn’t a very informative model, and the estimates have tremendous uncertainty. However I am confused by a few things.
-
I thought thresholds (Intercepts) were estimated threshold points between categories in the predictor variable. In model 1 these values make sense (-1.04, etc). But the predictor variable is scaled, so how can an estimate be so far from zero (like between -22.7 and -4.9 for Intercept[1])? I would instead expect values between -3 and 3, even though they will clearly overlap a lot.
-
The group-level variance also seems very high (between 7 and 35). Again, I understand the large uncertainty, but I do not understand the absolute size of the values given that the predictor is scaled to 0.
Any guidance on why these values are so high? Am I misinterpreting the values of the intercepts/thresholds and the sd(group_level)
? It’s easy to find example datasets where everything works, but it’s harder to find examples where things go wrong ;)
Thanks!
Data simulation
I wanted to make these values as close to my data as possible
seed = 11
vm <- c(-2.107, 0.140, -0.702, 0.371, -0.346, -0.539, 1.041, -0.548, 0.080, 0.316, 1.051, 0.836, -0.191, 0.735, -0.846, 0.176, 0.396, 0.500, -0.272, -0.482, -1.329, 0.227, 0.318, 0.213)
vsd <- c(1.065, 0.237, 0.693, 0.570, 0.502, 1.058, 0.916, 0.885, 0.619, 0.632, 0.787, 0.511, 0.954, 0.968, 0.285, 0.975, 0.998, 0.390, 0.500, 0.161, 1.030, 0.830, 0.400, 0.308)
d1 <- rnorm(96, mean=rep(vm,each=4), sd=rep(vsd, each = 4))
d2 <- rep(letters[1:24], each = 4)
vc <- c(16,16,16,48)
d3 <- rep(1:4, vc)
testdf <- dplyr::bind_cols(val = d1, species = d2, cat = d3)
brms 2.14.4
mac BigSur