Brms: How to get the ICC or VPC value for a multi-level negative binomial model?

Jacob_stanlearner · August 20, 2021, 2:28am

I run a brms model like the following code. It was a multi-level negative binomial model. I was wondering where to find or how to calculate the value of ICC/VPC of this model.

model<- brm(value ~ hour * period + offset(log(n)) + (1 | id), 
                          family=negbinomial(),
                          data = data, 
                          cores=2,chains = 2,
                          backend="cmdstanr",
                          threads=threading((4)),
                          seed=22323)

I had tried the “variance_decomposition()” of “performance” package. But it gave me some negative values. See the results:

variance_decomposition(model)
# Random Effect Variances and ICC

Conditioned on: all random effects

## Variance Ratio (comparable to ICC)
Ratio: -0.03  CI 95%: [-0.24 0.12]

## Variances of Posterior Predicted Distribution
Conditioned on fixed effects: 1.24  CI 95%: [1.10 1.42]
Conditioned on rand. effects: 1.20  CI 95%: [1.05 1.37]

## Difference in Variances
Difference: -0.04  CI 95%: [-0.27 0.16]

Here are the results of model got from the “summary()”:

 Family: negbinomial 
  Links: mu = log; shape = identity 
Formula: value ~ hour * period + offset(log(n)) + (1 | id) 
   Data: data (Number of observations: 78960) 
Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 2000

Group-Level Effects: 
~id (Number of levels: 50) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.03      0.02     0.00     0.06 1.00      911      996

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         -3.01      0.08    -3.16    -2.85 1.01      288      664
hour1             -0.88      0.13    -1.13    -0.63 1.00      597      924
hour2             -1.81      0.17    -2.14    -1.49 1.00     1015     1323
hour3             -3.23      0.29    -3.82    -2.72 1.00     1631     1189
hour4             -3.10      0.27    -3.68    -2.59 1.00     1995     1415
hour5             -2.44      0.20    -2.84    -2.04 1.00     1470     1282
hour6             -1.09      0.14    -1.37    -0.83 1.00      779     1043
hour7             -0.45      0.12    -0.68    -0.21 1.00      478      812
hour8             -0.17      0.11    -0.40     0.05 1.01      585     1100
hour9              0.05      0.11    -0.17     0.26 1.00      491      789
hour10            -0.30      0.12    -0.53    -0.07 1.00      662     1075
hour11            -0.07      0.11    -0.28     0.14 1.00      481      986
hour12             0.11      0.11    -0.10     0.32 1.01      477      872
hour13            -0.07      0.11    -0.30     0.16 1.00      606      897
hour14            -0.33      0.12    -0.56    -0.11 1.00      498     1090
hour15            -0.17      0.12    -0.41     0.05 1.00      514     1094
hour16            -0.06      0.11    -0.28     0.14 1.00      440      939
hour17             0.34      0.10     0.14     0.54 1.01      444     1037
hour18             0.31      0.10     0.11     0.51 1.00      404      969
hour19             0.10      0.11    -0.12     0.30 1.00      487      886
hour20             0.12      0.11    -0.09     0.34 1.00      459      966
hour21             0.27      0.11     0.06     0.48 1.00      459     1005
hour22             0.39      0.10     0.18     0.58 1.00      456      772
hour23             0.41      0.11     0.20     0.60 1.01      550      739
period2            0.10      0.11    -0.12     0.31 1.01      278      559
hour1:period2      0.13      0.17    -0.20     0.46 1.01      501     1105
hour2:period2      0.58      0.21     0.19     1.01 1.00      935     1336
hour3:period2      1.46      0.32     0.86     2.09 1.00     1315     1366
hour4:period2      0.59      0.32    -0.01     1.21 1.00     1700     1335
hour5:period2      0.40      0.26    -0.09     0.91 1.00     1181     1295
hour6:period2     -0.24      0.19    -0.60     0.14 1.00      784     1169
hour7:period2     -0.36      0.16    -0.67    -0.01 1.00      546     1051
hour8:period2     -0.46      0.16    -0.76    -0.16 1.00      568      936
hour9:period2     -0.30      0.15    -0.59    -0.00 1.00      491      807
hour10:period2     0.03      0.16    -0.29     0.33 1.00      562      961
hour11:period2    -0.15      0.15    -0.44     0.14 1.00      516      831
hour12:period2    -0.30      0.15    -0.61    -0.00 1.00      524     1073
hour13:period2    -0.03      0.15    -0.33     0.27 1.00      506      842
hour14:period2     0.27      0.15    -0.04     0.58 1.00      476      997
hour15:period2     0.09      0.15    -0.21     0.40 1.00      491      739
hour16:period2     0.03      0.15    -0.27     0.32 1.01      448      558
hour17:period2    -0.37      0.14    -0.65    -0.09 1.00      438     1030
hour18:period2    -0.29      0.14    -0.55    -0.02 1.01      403      993
hour19:period2    -0.05      0.15    -0.33     0.23 1.00      471      835
hour20:period2    -0.09      0.15    -0.37     0.21 1.00      442      792
hour21:period2    -0.09      0.14    -0.37     0.19 1.01      435      813
hour22:period2    -0.08      0.14    -0.35     0.21 1.00      428      692
hour23:period2    -0.16      0.14    -0.44     0.12 1.01      417     1058

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.33      0.01     0.32     0.34 1.00     4793     1480

Samples were drawn using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Looking forward to some advice! Many thanks!
Best,
Jacob

Operating System: win10
brms Version: 2.15.9

martinmodrak · September 6, 2021, 7:29pm

Hi,
I’ve never used the performance package, so maybe the question is better answered by some of the package devs. I noticed @DominiqueMakowski , @mattansb, @IndrajeetPatil , @bwiernik have accounts here, so maybe somebody has time to answer?

Best of luck with your model!

bwiernik · September 6, 2021, 8:02pm

You can see the computations used in this function by View(performance::variance_decomposition).

Copied here for convenience:


performance::variance_decomposition
function (model, re_formula = NULL, robust = TRUE, ci = 0.95, 
    ...) 
{
    if (!inherits(model, "brmsfit")) {
        stop("Only models from package 'brms' are supported.")
    }
    mi <- insight::model_info(model)
    if (insight::is_multivariate(model)) {
        resp <- insight::find_response(model)
        is.mixed <- unlist(lapply(resp, function(i) mi[[i]]$is_mixed))
        if (!any(is.mixed)) {
            warning("'model' has no random effects.", call. = FALSE)
            return(NULL)
        }
    }
    else if (!insight::is_mixed_model(model)) {
        warning("'model' has no random effects.", call. = FALSE)
        return(NULL)
    }
    insight::check_if_installed("brms")
    PPD <- brms::posterior_predict(model, re_formula = re_formula, 
        summary = FALSE, ...)
    var_total <- apply(PPD, MARGIN = 1, FUN = stats::var)
    PPD_0 <- brms::posterior_predict(model, re_formula = NA, 
        summary = FALSE, ...)
    var_rand_intercept <- apply(PPD_0, MARGIN = 1, FUN = stats::var)
    if (robust) {
        fun <- get("median", asNamespace("stats"))
    }
    else {
        fun <- get("mean", asNamespace("base"))
    }
    var_icc <- var_rand_intercept/var_total
    var_residual <- var_total - var_rand_intercept
    ci_icc <- rev(1 - stats::quantile(var_rand_intercept/var_total, 
        probs = c((1 - ci)/2, (1 + ci)/2)))
    result <- structure(class = "icc_decomposed", list(ICC_decomposed = 1 - 
        fun(var_icc), ICC_CI = ci_icc))
    attr(result, "var_rand_intercept") <- fun(var_rand_intercept)
    attr(result, "var_residual") <- fun(var_residual)
    attr(result, "var_total") <- fun(var_total)
    attr(result, "ci.var_rand_intercept") <- bayestestR::ci(var_rand_intercept, 
        ci = ci)
    attr(result, "ci.var_residual") <- bayestestR::ci(var_residual, 
        ci = ci)
    attr(result, "ci.var_total") <- bayestestR::ci(var_total, 
        ci = ci)
    attr(result, "ci") <- ci
    attr(result, "re.form") <- re_formula
    attr(result, "ranef") <- model$ranef$group[1]
    attr(attr(result, "ci.var_rand_intercept"), "data") <- NULL
    attr(attr(result, "ci.var_residual"), "data") <- NULL
    attr(attr(result, "ci.var_total"), "data") <- NULL
    result
}


> ?performance::variance_decomposition
> performance:::print.icc_decomposed
function (x, digits = 2, ...) 
{
    cat("# Random Effect Variances and ICC\n\n")
    reform <- attr(x, "re.form", exact = TRUE)
    if (is.null(reform)) {
        reform <- "all random effects"
    }
    else {
        reform <- .safe_deparse(reform)
    }
    cat(sprintf("Conditioned on: %s\n\n", reform))
    prob <- attr(x, "ci", exact = TRUE)
    cat(insight::print_color("## Variance Ratio (comparable to ICC)\n", 
        "blue"))
    icc.val <- sprintf("%.*f", digits, x$ICC_decomposed)
    ci.icc.lo <- sprintf("%.*f", digits, x$ICC_CI[1])
    ci.icc.hi <- sprintf("%.*f", digits, x$ICC_CI[2])
    cat(sprintf("Ratio: %s  CI %i%%: [%s %s]\n", icc.val, 
        as.integer(round(prob * 100)), ci.icc.lo, ci.icc.hi))
    cat(insight::print_color("\n## Variances of Posterior Predicted Distribution\n", 
        "blue"))
    null.model <- sprintf("%.*f", digits, attr(x, "var_rand_intercept", 
        exact = TRUE))
    ci.null <- attr(x, "ci.var_rand_intercept", exact = TRUE)
    ci.null.lo <- sprintf("%.*f", digits, ci.null$CI_low)
    ci.null.hi <- sprintf("%.*f", digits, ci.null$CI_high)
    full.model <- sprintf("%.*f", digits, attr(x, "var_total", 
        exact = TRUE))
    ci.full <- attr(x, "ci.var_total", exact = TRUE)
    ci.full.lo <- sprintf("%.*f", digits, ci.full$CI_low)
    ci.full.hi <- sprintf("%.*f", digits, ci.full$CI_high)
    ml <- max(nchar(null.model), nchar(full.model))
    ml.ci <- max(nchar(ci.full.lo), nchar(ci.null.lo))
    mh.ci <- max(nchar(ci.full.hi), nchar(ci.null.hi))
    cat(sprintf("Conditioned on fixed effects: %*s  CI %i%%: [%*s %*s]\n", 
        ml, null.model, as.integer(round(prob * 100)), ml.ci, 
        ci.null.lo, mh.ci, ci.null.hi))
    cat(sprintf("Conditioned on rand. effects: %*s  CI %i%%: [%*s %*s]\n", 
        ml, full.model, as.integer(round(prob * 100)), ml.ci, 
        ci.full.lo, mh.ci, ci.full.hi))
    cat(insight::print_color("\n## Difference in Variances\n", 
        "red"))
    res <- sprintf("%.*f", digits, attr(x, "var_residual", 
        exact = TRUE))
    ci.res <- attr(x, "ci.var_residual", exact = TRUE)
    ci.res.lo <- sprintf("%.*f", digits, ci.res$CI_low)
    ci.res.hi <- sprintf("%.*f", digits, ci.res$CI_high)
    cat(sprintf("Difference: %s  CI %i%%: [%s %s]\n", res, 
        as.integer(round(prob * 100)), ci.res.lo, ci.res.hi))
    invisible(x)
}

The line of the output you see labeled “Conditioned on rand. effects: 1.20 CI 95%: [1.05 1.37]” is var_all_re (var_total), the estimated variance of the posterior predictive distribution reported by brms with the random effects specified by the user in the function (defauls to NULL meaning to include all random effects in the PPD).

The line of the output you see labeled “Conditioned on fixed effects: 1.24 CI 95%: [1.10 1.42]” is var_no_re, the estimated variance of the posterior predictive distribution reported by brms with the random effects set to NA, meaning all random effects are fixed to zero. See ?brms::posterior_predict (variance in the PPD only predicted by the fixed effects).

The Variance Ratio (comparable to ICC) is computed as 1 - var_no_re / var_total, reflecting the estimated ratio of the random effects variance to the total variance.

Generally, var_total should be larger than var_no_re because the random effects variance is included. However, in your case, it appears that there is very little random effects variance in your fitted model, and much of the posterior for this effect is zero (or perhaps negative? I am unsure what priors brms places on sd(Intercept) by default). So, the median var_no_re ends up larger than the median var_total, producing the negative value for the Variance ratio and Difference in Variances you see here.

cf. that a model with a larger estimated intercept variance does show the expected pattern (though with a wide posterior given the small sample size and presumably weak priors):

fit <- brm(incidence ~ period + offset(log(size)) + (1 | herd), data = lme4::cbpp, family = negbinomial())
performance::variance_decomposition(fit)
# # Random Effect Variances and ICC
# 
# Conditioned on: all random effects
# 
# ## Variance Ratio (comparable to ICC)
# Ratio: 0.18  CI 95%: [-2.24 0.81]
# 
# ## Variances of Posterior Predicted Distribution
# Conditioned on fixed effects: 6.00  CI 95%: [2.07 25.21]
# Conditioned on rand. effects: 7.40  CI 95%: [2.53 36.53]
# 
# ## Difference in Variances
# Difference: 1.21  CI 95%: [-12.67 24.88]

Jacob_stanlearner · September 15, 2021, 12:37pm

Thanks for your reply! Really helpful and clear!

Topic		Replies	Views
What is the most appropriate approach to compute an ICC for a brms MLM? Interfaces brms	0	355	February 4, 2024
How to calculate VPC in brms brms fitting-issues , specification	1	997	December 14, 2020
ICC for a beta_binomial model Modeling techniques , specification , brms	0	169	May 22, 2024
Multiple membership multilevel model - clarification questions Modeling brms	5	78	October 11, 2024
Manually generate posterior predictions from negative binomial model fit with brms brms posterior-predictive , brms	5	1284	December 14, 2022

Brms: How to get the ICC or VPC value for a multi-level negative binomial model?

Related topics