Multilevel ordinal model - large values in threshold and group level sd estimates?

Hi,

I am working with some cumulative family ordinal data, and I have some questions on interpreting some values for thresholds and group-level sd.

My data contains a scaled value (n= 96, mean = 0, sd = 1), an ordinal category (n = 4), and a grouping variable (species in this case, n = 24). See end of the post for data simulation code.

testdf
# A tibble: 96 x 3
       val species   cat
     <dbl> <chr>   <int>
 1 -4.02   a           1
 2 -2.93   a           1
 3 -2.23   a           1
 4 -2.43   a           1
 5  0.225  b           1
 6 -0.0597 b           1
 7 -0.0194 b           1
 8  0.0591 b           1
 9 -0.103  c           1
10 -0.545  c           1
# … with 86 more rows

Estimating a simple ordinal regression works fine. It’s not a great fit to the data, and there isn’t a lot of separation between some categories, but all the values are reasonable and make sense given that the predictor is scaled to N(0,1).

tm.1 <- brm(
  cat ~ val, data = testdf,
  family = cumulative(link = "probit"),
  chains = 2, cores = 2, 
  control = list(adapt_delta = 0.99, max_treedepth = 15)
)
summary(tm.1)
# summary cropped for space
Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]    -1.04      0.16    -1.37    -0.73 1.00     2255     2505
Intercept[2]    -0.46      0.14    -0.72    -0.18 1.00     3618     2899
Intercept[3]    -0.00      0.14    -0.27     0.27 1.00     4876     2415
val              0.20      0.10     0.00     0.39 1.00     3127     2489

When I add in the species grouping variable as a group level effect, everything changes.

tm.2 <- brm(
  cat ~ val + (1 | species), data = testdf,
  family = cumulative(link = "probit"),
  chains = 2, cores = 2, 
  control = list(adapt_delta = 0.99, max_treedepth = 15)
)
summary(tm.2)
# summary cropped for space
Group-Level Effects: 
~species (Number of levels: 24) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)    16.61      7.47     7.53    36.54 1.00     1142     1343

Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]   -12.95      5.87   -27.55    -4.93 1.00      968     1331
Intercept[2]    -3.31      2.84    -9.57     1.62 1.00      849     1661
Intercept[3]     3.31      2.91    -1.40    10.26 1.00      710     1499
val              0.16      0.77    -1.32     1.72 1.00     1977     1694

This clearly isn’t a very informative model, and the estimates have tremendous uncertainty. However I am confused by a few things.

  1. I thought thresholds (Intercepts) were estimated threshold points between categories in the predictor variable. In model 1 these values make sense (-1.04, etc). But the predictor variable is scaled, so how can an estimate be so far from zero (like between -22.7 and -4.9 for Intercept[1])? I would instead expect values between -3 and 3, even though they will clearly overlap a lot.

  2. The group-level variance also seems very high (between 7 and 35). Again, I understand the large uncertainty, but I do not understand the absolute size of the values given that the predictor is scaled to 0.

Any guidance on why these values are so high? Am I misinterpreting the values of the intercepts/thresholds and the sd(group_level)? It’s easy to find example datasets where everything works, but it’s harder to find examples where things go wrong ;)

Thanks!

Data simulation
I wanted to make these values as close to my data as possible

seed = 11
vm <- c(-2.107, 0.140, -0.702, 0.371, -0.346, -0.539, 1.041, -0.548, 0.080, 0.316, 1.051, 0.836, -0.191, 0.735, -0.846, 0.176, 0.396, 0.500, -0.272, -0.482, -1.329, 0.227, 0.318, 0.213)
vsd <- c(1.065, 0.237, 0.693, 0.570, 0.502, 1.058, 0.916, 0.885, 0.619, 0.632, 0.787, 0.511, 0.954, 0.968, 0.285, 0.975, 0.998, 0.390, 0.500, 0.161, 1.030, 0.830, 0.400, 0.308)
d1 <- rnorm(96, mean=rep(vm,each=4), sd=rep(vsd, each = 4))
d2 <- rep(letters[1:24], each = 4)
vc <- c(16,16,16,48)
d3 <- rep(1:4, vc)
testdf <- dplyr::bind_cols(val = d1, species = d2, cat = d3)

brms 2.14.4
mac BigSur

1 Like

Hi, sorry for not getting to you earlier, your qeustion is relevant and well written.

I think the problem is that the model parameters cannot be informed by the data as the species don’t really have any variability in responses - when I look at

testdf %>% group_by(species) %>% summarise(min(cat), max(cat))

I get:

   species `min(cat)` `max(cat)`
 * <chr>        <int>      <int>
 1 a                1          1
 2 b                1          1
 3 c                1          1
 4 d                1          1
 5 e                2          2
 6 f                2          2
 7 g                2          2
 8 h                2          2
 9 i                3          3
10 j                3          3
# ... with 14 more rows

So this means that I can get the exact same likelihood by e.g. lowering the threshold between responses 1 and 2 and simultaneously lowering the varying intercept term for species a - d. When this happens, the joint distribution of the parameters is constrained by data, but the marginal distributions are only constrained by your prior. Since you use the default student_t(3, 0, 2.5) prior for both intercepts and SDs the constrains are not very tight:

qstudent_t(c(0.025,0.975), df = 3) * 2.5
# [1] -7.956116  7.956116

The posterior variability in your case is a bit higher than that and I admit I don’t understand completely why this is the case, maybe just because of the interactions with val ? Anyway the pairs plot IMHO shows the pathologies quite neatly:

 bayesplot::mcmc_pairs(tm.2, pars = c("b_val", "b_Intercept[1]", "b_Intercept[2]", paste0("r_species[", c("a","f"),",Intercept]")))

obrazek

This is also why you needed such high adapt_delta and max_treedepth to fit the model without issues.

Best of luck with your model!

1 Like

Thank you @martinmodrak for the thought out response. This makes a lot of sense. I am going to continue this conversation at the forum post Categorical family: group-levels not present in each response category - is this ok? because the two problems presented in this post and the other one stem from the same issue: “complete separation” in group-level predictors in an ordinal/categorical model.

1 Like