Hierarchical mediation with level-2 IV and mediator (brms)

Hi All,

I would like to estimate a hierarchical mediation model where both the IV of interest (X) and the mediator (M) are level 2 variables and the DV (Yij) is a repeated measures variable with i measurements for each j subject.

So, breaking this into separate models, I’d have something like:

Y ~ X + (1 | id)
Y ~ X + M + (1 | id)
M ~ X

In the past, I’ve used brms’s capacity for handling multivariate outcomes to estimate the second and third models simultaneously; however, in this case, I am not sure that this is possible due to the structure of the data required for these models.

Y ~ X + M + (1 | id) requires long format data where each subject is represented across j rows and the entries for X and M are identical across j rows for each subject. However, if this data is naively used to estimate M ~ X, the sample size is inflated by a factor of j. That is, for M ~ X, we need each measurement of M and X to be uniquely represented in the data (rather than repeated).

So, I am wondering if there is a trick for dealing with this in brms; for instance, can I specify different data for each formula? Or, perhaps is there is another way to get around this issue?

The other option, of course, is to just estimate separate models. But, I thought that (1) this might be a common problem and (2) there might be an established way to do this.

I did search around quite a bit (on discourse and brms’s support material), but I haven’t found anything helpful.


Since M ~ X is measured on a different level, we can’t really related it to the other formulas in a way that the multivariate model is any different than the set of univariate models. So you can savely go for the latter option I think

That makes sense. Thank you!

I was wondering what would you think about the idea that I am presenting below. It requires some additional data preparation, but otherwise it gives results similar to those obtained in some commercial software. However, the CIs are a little bit larger.

It is basically based on joining separate data.frames for level 1 and level 2 variables, and using 0-1 indicators as weights in level 1 and level 2 formulas. I think it should be even easier with the new subset parameter.

Here is the code based on simulated data:

# focal predictor - between level
x <- rnorm(30)
# focal mediator - between level
m <- 1 + 2*x + rnorm(30, 0, 2)
# extend to a full data.frame (30 groups with 30 individuals each) 
xr <- rep(x, each = 30)
mr <- rep(m, each = 30)
# group id
id <- rep(1:30, each=30)
# random intercepts
ints <- rep(rnorm(30, 5, 2), each=30)
# simulate within level outcome variable
yr <- ints + 2*xr + 3*mr + rnorm(900, 0, 30)

# include all variables in one data frame
# within part comes first, then between part
# fill the missing parts with 0
# include indicator weights (denoting within and between parts)
df <- data.frame(yr = c(yr, rep(0, 30)),
                 mr = c(mr, rep(0, 30)),
                 xr = c(xr, rep(0, 30)),
                 id = c(id, 1:30),
                 x = c(rep(0, 900), x),
                 m = c(rep(0, 900), m),
                 with = c(rep(1, 900), rep(0, 30)),
                 betw = c(rep(0, 900), rep(1, 30)))
# use indicator variables with weights command
# between part of the model
bb <- bf(m | weights(betw) ~ x)
# within part of the model
bw <- bf(yr | weights(with) ~ xr + mr + (1 | id))

# fit the model
fit <- brm(bb + bw + set_rescor(rescor = F),
           data= df)
# print the resutlts
# test indirect effect
hypothesis(fit, c(indirect = 'm_x * yr_mr = 0'))

This should literally result in the same results as the univariate models unless you (incorrectly) estimate the residual correlation (which does not make sense here). As a side note, you can already use the subset argument in the github version of brms.

Yes, but the problem with the univariate model I see is that you cannot simultaneously obtain draws for paths X->M and M->Y. Hence, you cannot compute CIs for the indirect effect, which is sometimes required by the reviewers.
Anyway, thanks for the response.

Ahh I see. You can still do it based on the univariate models, by extracting the posterior samples from both models and then combine them in one data frame (as the coefficients will be independent from each other). However, I agree with you that is it more convenient to have them all stored within the same model. Hope you find the subset argument usefull in that regard.