Brms: Mediation analysis or not?

Operating System: Mac OS 10.14
brms Version: 2.10.5

Dear brms and Bayesian modeller community,
I have a partly general partly brms modelling-specific question. Please excuse this as an expression of my state of confusion.

My data is resulting from a cross-sectional (quasi) observational study without intervention. I would like to analyse the association between three variables of interest (physiological and behavioural) in the presence of a fourth design variable:
IV, DV, and M, my potential moderator, are all continuous and best fit with something like exgaussian or skew_normal distribution.
The fourth variable is a binary grouping variable, which reflects a design variable; I recruited old and young participants. The grouping variable, i.e., age, is obviously influencing all other parameters.

So my general question would be: Does it at all make sense to include the two groups into one model?

Having read into the benefits and pitfals of mediation analysis and excluding any attempt to claim causality, I still think mediation might be a legit way to go to get more insight into my data. - Although I am happy to be convinced otherwise. -

So, let’s say mediation is the way to go, and I should combine both groups into one model, would it be best to add factor age group (or maybe age as continuous?) into the model just as an additional factor (or covariate, respectively), like

f1 <- bf(M ~ IV + group , family = “exgaussian”) # mediator model
f2 <- bf(DV ~ M + group + IV, family = “exgaussian”) # outcome model
m_ccL <- brm(f1 + f2 + set_rescor(FALSE), data = ebg.df, cores = 4)

or would it make more sense to include an interaction with the grouping variable:
f1 <- bf(M ~ IV * group , family = “exgaussian”)
f2 <- bf(DV ~ M * group + IV * group, family = “exgaussian”)

And this is where I am obviously ending in complete confusion because I am wondering whether I need to have this interaction with both the mediator and the DV? To make things worse (or really interesting), running the models for the groups separately shows a diverging effect of the moderator.

And hint to come to a solution in this matter would be very much appreciated.


1 Like

First, your question is a little confusing. Towards the beginning, you introduced your first three variables as “IV, DV, and M, my potential moderator.” Then later when presenting your bf() statements, you seemed to treat M as a mediator. Are you conceiving of M as a mediator or a moderator?

If you consider M a mediator, I personally recommend against fitting your proposed mediation model with cross-sectional data. Take that for what you will.

Your question about your fourth variable, Age, seems to be largely conceptual. If you suspect it interacts with the effects of any of your other variables on your DV, the best way is to include it in the model as a moderator. IMO, fitting separate models for the older and the younger will just muddy the waters.

Dear Solomon,
thank you for replying to my question.

Indeed, it should have said
M, which is my potential moderator.

So am I understanding your comment correctly, you would not at all fit a mediation model in my case?

But if I do, your advice is to include age (continuous) as moderator instead of splitting into the two groups.
So this would then look like:

f1 <- bf(M ~ IV * age , family = “exgaussian”)
f2 <- bf(DV ~ M * age + IV * age, family = “exgaussian”)


Thank you for clarifying your terms.

I think it’s fine to fit cross-sectional mediation models when you’re learning. That’s how I learned. In my domain, at least (social science), cross-sectional mediation models are not appropriate for scientific inference. You need longitudinal data for that.

What ever model you fit, I’d recommend treating your continuous moderator as continuous. I grant you there are instances when you can reasonably split up a continuous variable if it’s severely bimodal (e.g., young folks have ages \sim \operatorname{Normal} (20, 0.5) and old folks have ages \sim \operatorname{Normal} (40, 0.5)). But to the extent your moderator Age is collected along a continuum, it’d be great to treat it a such.

Your proposed syntax looks fine.

Thanks again for your input and let me ask one more question:

I do actually examine to extreme age groups neglecting the middle age in this study, which led me to split the data set in this analysis step. My expectation from other data sets is that age is the variable that explains mostly all the variance and leads to severe multicollinearity of the other parameters.
Multicollinearity is also a problem with this mediation model when including age (group) as additional parameter (have not tried the moderator version, yet).

I guess my question is partly asking for “absolution” to split the group and partly asking for an alternative. I know this approach is very data-driven anyways…

Multicollinearity is a beast to deal with, for sure. If Age eats up most of your variance, I’d lean into that somehow. Seems interesting.

A problem with splitting up the analysis is you lose your ability to formally compare parameters between the two groups. Comparisons between the old and young take on a more descriptive/qualitative flair if they’re analyzed in separate models.

Thanks again for your thoughts.
I keep on chewing on this then …

1 Like