Calculating Intra-Cluster Correlation (ICC) for the Group-Level Effect in a Multinomial GLMM

blokeman · July 13, 2018, 10:52pm

Hosmer et al. (2013: 327) advise that in a logistic regression model with a single random (group-level) effect, the so-called intra-cluster correlation is an estimate of the proportion of the overall variability accounted for by the random (group-level) effect. The formula that they provide for computing this statistic is \frac{\sigma^2}{{\frac{1}{3}\pi^2}+\sigma^2}, where \sigma is the estimated standard deviation of the random (group-level) effect. How do I apply this formula to a multinomial model with C = 4 categories and one random effect per non-baseline category? The raw frequencies of the categories are c_1= 688 (baseline category), c_2 = 747, c_3 = 667, and c_4 = 437 respectively.

If left to my own devices I’d probably calculate the average of the three ICCs, weighted by the overall counts of the three non-baseline categories. But I’d rather not rely on my own vague and error-prone mathematical intuitions.

Hosmer, D. W., Lemeshow, S. & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). Hoboken, N.J.: Wiley.

Bob_Carpenter · July 15, 2018, 12:48am

I’m not sure what you mean by “per regresison equation”. Is it just a discrete (as opposed to truly multinomial) regression?

If you want the posterior standard deviation of a parameter, you need to do that outside of Stan using the posterior draws.

blokeman · July 15, 2018, 6:10am

Sorry, that must have been an erroneous wording (I’ve edited it now). It’s multinomial. With 4 outcome categories, the model is composed of 3 binary logistic regressions, each comparing one non-reference outcome with the reference outcome. So for each parameter specified in the model formula, there are three parameters (one for each component logistic regression). What I want to do is calculate a “summary statistic” of the ICC for the entire multinomial model, using the three individual ones.

Bob_Carpenter · July 15, 2018, 7:27pm

Why not use multi-logit? It’s the usual approach for multiple categories. There’s a discussion in the manual regression chapter.

That doesn’t answer this question, but presumably what you’re trying to work out is reduction in variance from the predictors.

strengejacke · July 16, 2018, 7:34am

There is an icc()-function in the sjstats-package, which can be used for stanreg or brms objects.
However, for non-Gaussian models, it is recommended to calculate the ICC based on the posterior predictions (I think this was stated by @Bob_Carpenter or @bgoodri). You can use the ppd-argument to calculate the ICC based on the posterior predictions.

strengejacke · July 16, 2018, 8:32am

Here’s the link to the discussion.

blokeman · July 16, 2018, 9:38pm

Thanks for the input, strengejacke. I tried the package you’re referring to. And indeed, its author defines ICC identically to what I’ve understood it to mean: “the proportion of the variance explained by the grouping structure in the population”. But the output of the icc() function is cryptic to me:

icc(mod, ppd = TRUE)
# Random Effect Variances and ICC

Family: categorical (logit)
Conditioned on: all random effects

## Variance Ratio (comparable to ICC)
Ratio: 0.01  HDI 89%: [-0.10 0.13]

## Variances of Posterior Predicted Distribution
Conditioned on fixed effects: 1.10  HDI 89%: [0.97 1.22]
Conditioned on rand. effects: 1.11  HDI 89%: [1.07 1.14]

## Difference in Variances
Difference: 0.01  HDI 89%: [-0.11 0.14]

A ratio comparable to ICC is 0.01? That would mean next to no ICC, entailing that the random effect has little explanatory power. But this is not true – even just comparing the Deviances of the two models yields a difference of over 600 points – on three degrees of freedom! Also, the random-effects model predicts 62% of the quaternary responses correctly, compared to 57% for the fixed-effects model. It doesn’t seem an inconsequential group-level effect to me.

Also, running icc(mod, ppd = FALSE) yields:

# Random Effect Variances and ICC

Family: categorical (logit)

## trigger
          ICC: 0.50  HDI 89%: [0.36 0.65]
Between-group: 1.07  HDI 89%: [0.46 1.66]

## Residuals
Within-group: 1.00  HDI 89%: [1.00 1.00]

This says ICC is 0.5, i.e. the random effect accounts for HALF the explained variance. This, on the other hand, sounds very extreme – none of the three component logistic regressions constituting this multinomial model seems to have an ICC that high – calculating them individually using Hosmer et al’s formula (see first post) yields .24, .25, and .47, respectively. Their average (weighted by sample size) yields 0.3.

Which of the two values seen above is the estimated ICC? The 0.01 or the 0.50 – or neither?

Hmm. If I had to guess, I’d guess it’s the second one, calculated with ppd = FALSE. That’s where the output correctly names the random effect (‘trigger’) whereas the output of ppd = TRUE just uses the generic phrase “all random effects”. But still – 50 percent?

blokeman · July 17, 2018, 12:07pm

I’ll take a shot at calculating it myself. Maybe people can tell me something I’m doing wrong:

Reminder: it’s a categorical logit model with 4 outcome categories and one random effect.

ll.re <- log_lik(mod, re_formula = NULL)#Posterior predictive LLs using random effect
ll.fe <- log_lik(mod, re_formula = NA)  #Posterior predictive LLs ignoring random effect
LL.re <- mean(apply(ll.re, 1, sum))     #Total posterior predictive LL using random effect
LL.fe <- mean(apply(ll.fe, 1, sum))     #Total posterior predictives LL ignoring random effect
dev.re <- -2*LL.re                      #Convert to deviance scale (equivalent to variance in LMEs (?))
dev.fe <- -2*LL.fe                      #Same as above, ignoring random effect
tau.00 <- dev.fe-dev.re                 #Between-group deviance?
sigma_2 <- dev.re                       #Within-group (residual) deviance?
ICC <- tau.00 / (tau.00 + sigma_2)      #Intracluster correlation?
ICC
[1] 0.1798372

Does this look right?

strengejacke · July 17, 2018, 12:10pm

The ICC also depends on whether you calculate the mean or median of posterior draws. See argument typical in icc(). And the “ICC” based on the posterior predictive distribution is not necessarily close to a “classical” ICC.

blokeman · July 17, 2018, 12:15pm

Thank you. I find that using the median instead of the mean yields the same result (to two decimal places).

Might I infer that since that was the only thing that raised a comment, you see no error in the rest of the procedure?

strengejacke · July 17, 2018, 8:21pm

For the classical ICC, you get tau.00 from VarCorr(). I’m not sure whether using the log-likelihood is the appropriate approach.

Topic		Replies	Views
Intraclass correlation coefficient of non-linear parameter brms ecology , brms	10	1563	June 4, 2021
Interpreting random effect on intercept using ICC brms interpret-results	4	2555	May 20, 2019
Intraclass correlation for ordered logit models in brms Modeling techniques , interpret-results , brms	2	1169	January 26, 2022
Extending the interclass correlation coefficient (ICC, aka Repeatability aka Heritability) for Bayesian models General techniques , brms	3	1541	June 22, 2022
Interpretation of random effect and Intraclass correlation coefficient (ICC) General techniques , interpret-results	1	1731	May 25, 2020

Calculating Intra-Cluster Correlation (ICC) for the Group-Level Effect in a Multinomial GLMM

Related topics