Inferences marginal of random effects in multi-level models

This is a general question about making inferences in multi-level models by communicating the uncertainty due to between-cluster variation in the parameters. In a number of specific applications, I’ve found it affects the inferences quite a lot, and wanted to get other people’s opinions. I’m also interested in how certain packages marginalize over the random effect distributions in practice - whether they simulate new values based off the cluster-specific variation, or whether the use the estimated cluster-specific values.

This topic comes up in a number of settings, I believe, often under the terms “conditional versus unconditional effects”, “conditional versus marginal effects”, “confidence intervals versus prediction intervals”. I also think that these different inferences are possible to make in packages like rstanarm and brms, although I’ve found the code underlying the functions used to generate the predictions a bit impenetrable.

A general example:
Consider the multi-level model for dependent variable y, independent variable x with coefficient \beta, and cluster-specific intercept and slope parameters \boldsymbol{\nu} = \{\nu_{j_{1}}, \nu_{j_{2}}\}, j = 1,...,J:

y_{ij} \sim N(\mu_{ij}, \sigma)
\mu_{ij} = \alpha + \nu_{j_{1}} + (\beta + \nu_{j_{2}}) x
\boldsymbol{\nu} \sim MVN(\boldsymbol{0}, \Sigma)

where \sigma is the within-cluster residual variance, and \Sigma is the 2 \times 2 variance-covariance ‘random effect’ matrix with between-individual intercept and slope variances \sigma_{j_{1}}^2 and \sigma_{j_{2}}^2 on the diagonals, and covariance \sigma_{j_{1,2}} on the off-diagonals.

We make inferences in this model either by 1) investigating the ‘fixed’ population-level parameters holding the cluster-specific terms at their means (i.e. 0), or 2) incorporating the variation due to clusters by marginalizing over the cluster-specific terms.

For instance, the effect of x on y is given by the distribution of \beta in the first instance but by the distribution of \beta + \nu_{j_{2}} in the second. The latter’s uncertainty is much larger than the first’s, and can change the inferences massively (i.e. an effect that was quite certain can become massively uncertain). I’m interested in which is the best and most appropriate inference.

A specific example
As a specific example, I fit the above model to the sleepstudy data using the brms package:

require(brms)
require(lme4) # for the data

data("sleepstudy")
d <- sleepstudy

fit <- brm(Reaction ~ 1 + Days + (1+ Days | Subject), 
           data = d,
           cores = 4
           )

ps <- as.matrix(fit)

To obtain predictions marginal of the random effects, we can either simulate new parameter values representative of the variation estimated by the model, or use the estimated cluster-specific values, i.e.:

##### simulate from the random effect distribution (new clusters)
n_draws <- nrow(ps)
vcov_mats <- array(NA, dim=c(n_draws, 2,2)) # hold the variance-covariance matrices

# get each matrix from each MCMC iteration
for(i in 1:nrow(ps)){
  vcov_mats[i, , ] <- matrix(c(
    # variance of intercept
    ps[i,"sd_Subject__Intercept"]^2, 
    # intercept-slope covariance
    ps[i,"sd_Subject__Intercept"] * ps[i,"sd_Subject__Days"] * ps[i,"cor_Subject__Intercept__Days"],
    # variance of slope
    ps[i,"sd_Subject__Days"]^2, 
    # covariance
    ps[i,"sd_Subject__Intercept"] * ps[i,"sd_Subject__Days"] * ps[i,"cor_Subject__Intercept__Days"]
  ), 
  nrow = 2, ncol = 2, byrow=T)
}

# use mvrnorm to simulate the random effect distribution
u_sim <- t(sapply(1:nrow(ps), function(x) {
  MASS::mvrnorm(n = 1, mu = c(0,0), Sigma = vcov_mats[x,,], tol = 1e3)
}))

OR

# extract the fitted random effects
nu_fit_int <- ps[,grep("\\br_Subject\\b",colnames(ps))][,1:length(unique(d$Subject))]
nu_fit_slope <- ps[,grep("\\br_Subject\\b",colnames(ps))][,(length(unique(d$Subject))+1):(length(unique(d$Subject))*2)]

The first provides us with N representative values for the cluster-specific deviations if we observed new clusters, and the second uses the estimated in-model clusters’ values.

Using these different predictions, we can look at the effect of Days for both the conditional (the same in both cases) and unconditional predictions. In either case, the effect of Days includes 0 as a plausible value in the unconditional estimates. Thus, while for an average subject, Days increases Reaction in the model, the variation across subjects means that this effect is highly uncertain and could be negative.

We can also look at the implied trajectories of Reaction times across Days using both types of predictions:

I’d be interested to know what people tend to use when making inferences from their models (e.g. what is the appropriate interpretation of the effect of Days on Reaction?), and whether there are any better ways of incorporating the random effects as I have done above.

Code: posterior-predictions-sleepstudy.R (4.3 KB)

1 Like

I’m wondering if @paul.buerkner has any ideas about this? Or if information about how fixed effects estimates are estimated marginal of the random effects in brms could be provided?