Estimate icc for grouping-level in mixed model with partial and crossed nesting

I want to estimate the proportion of variance in an outcome y attributable to a level of grouping (coach) in a mixed model. This is similar to analyses of psychotherapy where patients are nested in therapists and analysts estimate the therapist effect.

I have secondary data from a trial with complex randomization and partial/crossed nesting.

  1. Villages were randomly assigned to treatment or control. This randomization was blocked by village type (host).
  2. Households in the treatment villages were randomly assigned to groups of ~25 people, and these groups were randomly assigned to 1 of 4 arms: arms 1-3 were variants of the core program, while arm 4 was a spillover control arm, meaning these were people who lived in treated villages but were not themselves in the program. Therefore, trt==1 if arm is 1, 2, or 3; trt==0 for people in control villages or arm 4 (spillover control)
  3. Coaches were randomly assigned to arms 1 to 3. Coaches crossed villages (meaning they could have groups in multiple villages).
  4. Everyone was assigned to a group, but groups were only meaningful for people in arms 1-3. There were “groups” of people assigned to arm 4 (spillover control), but there were no group activities because these folks did not receive any programming.

Since group and coach were not meaningful for control individuals (people from control villages and folks in arm 4), I assigned them singleton clusters for coach and group following Candlish et al. (2018).

This is the model I tried to fit using some simulated data:

library(tidyverse)
library(brms)
library(performance)

df <- read_csv("https://raw.githubusercontent.com/ericpgreen/g2r_coaching/main/reports/sim.csv")

fit <- brms::brm(y ~
                   trt +                  # 1 if arm 1, 2, or 3
                                          # 0 if pure control or arm 4
                                          # (arm 4 is control in a trt village)
                   coaching_individual +  # 1 if arm 1, individual trt
                   coaching_group +       # 1 if arm 2, group trt
                   no_asset +             # 1 if arm 3, ind trt, no $$
                   spillover_control +    # 1 if arm 4, no program 
                   host +                 # 1 if host community
                   (1 | village) +        # people nested in villages
                   (0 + trt | coach) +    # people nested in coaches 
                                          # (but only when trt==1)
                   (0 + trt | group),     # people nested in groups
                                          # (but groups only meaningful trt==1)
                 data = df)

I thought I could use performance::icc() to estimate ICC for coach, but it complained about the random slopes and only produces an estimate for village.

performance::icc(fit, by_group = TRUE)

I tried to use performance::variance_decomposition(), but it only returns an overall result.

I think the answer is that the ICC is not possible to calculate when there are random slopes. Is this true, or is there a different approach I might take?

If it’s not possible, I’m wondering about the logic of subsetting the data to just trt==1 (thus removing the partial nesting and random slopes) and focusing on the coach effect among program recipients who received different variants of the program.

df2 <- df %>% 
  filter(trt==1)

fit2 <- brms::brm(y ~
                   coaching_group +       # 1 if arm 2, group trt
                   no_asset +             # 1 if arm 3, ind trt, no $$
                   host +                 # 1 if host community
                   (1 | village) +        # people nested in villages
                   (1 | coach) +          # people nested in coaches 
                                          # (but only when trt==1)
                   (1 | group),           # people nested in groups
                                          # (but groups only meaningful trt==1)
                 data = df2)

You’re right that you can’t easily represent the ICC for random-slopes models, because it would be a function of the slope variable. To my understanding, as expressed in the link, you’d have to represent it conditionally which would rather defeat the point of an apparently simple single-value summary statistic like ICC.

Personally I would retain the model form that is justified by the design, and describe the variation in the relevant parameters or outcomes more directly. I just don’t find the ICC to be very informative for anything other than very simple models.

1 Like

Thanks @AWoodward. I appreciate your feedback.

This goes beyond the original code question, so no pressure here, but I wonder if you (or other readers) would suggest a different way to estimate the contribution of coaches in individual outcomes.

What about visualizing the expected responses and their variation, conditional on different combinations of coach/group? Or, predicted responses (including the residual error) holding specific values of the grouping parameters?

If the individual coaches or groups are of specific interest, then you do have direct estimates of their parameters from coef(). You also have the variance parameters which could be summarized directly via VarCorr().

Can you calculate the variances from the posterior samples and then calculate the ICCs from those?

I believe that performance::variance_decomposition() does this but just overall.

I’m thinking about visualizing variations by individual coaches as a secondary interest. Estimating the overall coach effect is my main interest.

I’m going to give more thought to your first suggestion. If you (or anyone) know of examples I might lean on I’d appreciate the info. Thanks again.