Inferences marginal of random effects in multi-level models

This is a general question about making inferences in multi-level models by communicating the uncertainty due to between-cluster variation in the parameters. In a number of specific applications, I’ve found it affects the inferences quite a lot, and wanted to get other people’s opinions. I’m also interested in how certain packages marginalize over the random effect distributions in practice - whether they simulate new values based off the cluster-specific variation, or whether the use the estimated cluster-specific values.

This topic comes up in a number of settings, I believe, often under the terms “conditional versus unconditional effects”, “conditional versus marginal effects”, “confidence intervals versus prediction intervals”. I also think that these different inferences are possible to make in packages like rstanarm and brms, although I’ve found the code underlying the functions used to generate the predictions a bit impenetrable.

A general example:
Consider the multi-level model for dependent variable y, independent variable x with coefficient \beta, and cluster-specific intercept and slope parameters \boldsymbol{\nu} = \{\nu_{j_{1}}, \nu_{j_{2}}\}, j = 1,...,J:

y_{ij} \sim N(\mu_{ij}, \sigma)
\mu_{ij} = \alpha + \nu_{j_{1}} + (\beta + \nu_{j_{2}}) x
\boldsymbol{\nu} \sim MVN(\boldsymbol{0}, \Sigma)

where \sigma is the within-cluster residual variance, and \Sigma is the 2 \times 2 variance-covariance ‘random effect’ matrix with between-individual intercept and slope variances \sigma_{j_{1}}^2 and \sigma_{j_{2}}^2 on the diagonals, and covariance \sigma_{j_{1,2}} on the off-diagonals.

We make inferences in this model either by 1) investigating the ‘fixed’ population-level parameters holding the cluster-specific terms at their means (i.e. 0), or 2) incorporating the variation due to clusters by marginalizing over the cluster-specific terms.

For instance, the effect of x on y is given by the distribution of \beta in the first instance but by the distribution of \beta + \nu_{j_{2}} in the second. The latter’s uncertainty is much larger than the first’s, and can change the inferences massively (i.e. an effect that was quite certain can become massively uncertain). I’m interested in which is the best and most appropriate inference.

A specific example
As a specific example, I fit the above model to the sleepstudy data using the brms package:

require(lme4) # for the data

d <- sleepstudy

fit <- brm(Reaction ~ 1 + Days + (1+ Days | Subject), 
           data = d,
           cores = 4

ps <- as.matrix(fit)

To obtain predictions marginal of the random effects, we can either simulate new parameter values representative of the variation estimated by the model, or use the estimated cluster-specific values, i.e.:

##### simulate from the random effect distribution (new clusters)
n_draws <- nrow(ps)
vcov_mats <- array(NA, dim=c(n_draws, 2,2)) # hold the variance-covariance matrices

# get each matrix from each MCMC iteration
for(i in 1:nrow(ps)){
  vcov_mats[i, , ] <- matrix(c(
    # variance of intercept
    # intercept-slope covariance
    ps[i,"sd_Subject__Intercept"] * ps[i,"sd_Subject__Days"] * ps[i,"cor_Subject__Intercept__Days"],
    # variance of slope
    # covariance
    ps[i,"sd_Subject__Intercept"] * ps[i,"sd_Subject__Days"] * ps[i,"cor_Subject__Intercept__Days"]
  nrow = 2, ncol = 2, byrow=T)

# use mvrnorm to simulate the random effect distribution
u_sim <- t(sapply(1:nrow(ps), function(x) {
  MASS::mvrnorm(n = 1, mu = c(0,0), Sigma = vcov_mats[x,,], tol = 1e3)


# extract the fitted random effects
nu_fit_int <- ps[,grep("\\br_Subject\\b",colnames(ps))][,1:length(unique(d$Subject))]
nu_fit_slope <- ps[,grep("\\br_Subject\\b",colnames(ps))][,(length(unique(d$Subject))+1):(length(unique(d$Subject))*2)]

The first provides us with N representative values for the cluster-specific deviations if we observed new clusters, and the second uses the estimated in-model clusters’ values.

Using these different predictions, we can look at the effect of Days for both the conditional (the same in both cases) and unconditional predictions. In either case, the effect of Days includes 0 as a plausible value in the unconditional estimates. Thus, while for an average subject, Days increases Reaction in the model, the variation across subjects means that this effect is highly uncertain and could be negative.

We can also look at the implied trajectories of Reaction times across Days using both types of predictions:

I’d be interested to know what people tend to use when making inferences from their models (e.g. what is the appropriate interpretation of the effect of Days on Reaction?), and whether there are any better ways of incorporating the random effects as I have done above.

Code: posterior-predictions-sleepstudy.R (4.3 KB)


I’m wondering if @paul.buerkner has any ideas about this? Or if information about how fixed effects estimates are estimated marginal of the random effects in brms could be provided?

The answer to your question is IMHO a big “it depends” and it depends on the real-world problem you are trying to attack. Some examples:

  • I want to analyze, whether the average subject in each condition changed reaction over time. Then just looking at the fixed beta is IMHO a good idea.
  • I want to analyze, whether a randomly selected subject in the study would have their reaction changed over time. Then including the varying intercept and effect is IMHO necessary, but I would use the fitted clusters.
  • I want to analyze whether a random person in a hypothethical new realization of the experiment will change their reaction over time. Then I would use the varying intercept and effect and simulate a new cluster.

Does that make sense?

So the big questions are: do I care about the average effect or do I care about individuals? And do I want to analyze what happened or do I want to generalize?

Note also that there are other options, like simulating a hypothetical replication with a limited number of new subjects (with simulated new cluster values) and taking the mean which would get you somewhere in between and might be useful if you want to understand how likely is a replication to see a similar result or when predicting what next few customers will do.

I’ll also note that in brms, the posterior_predict and posterior_linpred functions can do most of what you were computing manually with a combination of the re_formula, allow_new_levels and sample_new_levels parameters (more in the docs

Best of luck with your modelling!

1 Like