Fitting a bayesian multilevel mediation with brms

Hi All,

I’m new to brms and trying to fit a multi-level mediation model for within-person experimental data. I am sure this is a obvious question - but I cant seem to find an answer or example online (some practical details remain unclear). Since it’s for my thesis, any help would be greatly appreciated.

I came across this excellent blog post of Matti Vuorre which outlines exactly the structure I’m aiming for. However, I’m struggling with the choice of priors/distributions for the mediator.
In my posterior check pp_check - the distribution does not fit well at the tails. As far as I have understood, this is a big problem and a sign of an unreliable model- so my questions are:

  1. If this is a sign of misspecification, how do I go about in addressing this? I read the posts in this forum and tried some other distributions, but none seems to fit better. I am sure that adapting the priors afterwards / based on the posterior distribution is “overfitting”/cheating the model, so I assume fit a mixed-distribution based on the posteriors is also not allowed.
  2. Do I just use the not so well working tails? I have read tutorial were they then switch to a different distribution, and their problems are gone, but my mediator unfortunatley is not symmetrical - do I just change my mediators distribution (this feels wrong too)?

This is what I have done so far (see code at the end):
My data is within-person experimental data.I have isolated within-person deviations of my mediator and effect contrast coded all binary predictors. My controlvariable is z-standardized.
Within-person design

  • Predictors: x

  • Mediator: m (continuous, raw data ranges ~0–100, never exactly 0 or 100)

  • Outcome: y (binary)

  • Control variable: z-standardized

  • Outcome: Posterior predictive checks look good

  • Mediator: I’ve tried several families: student, gaussian, skew_normal, beta (after rescaling), hurdle_lognormal, etc.
    → Best fit so far: Student Distributio – mean looks good, but pp_check shows poor fit at the tails.

I have fitted a student distribution for the mediator and a benroulli for the outcome. The outcome model works well. The mean in the student works well with regards to predicting the mean stats and none of the other distributions fitted better (I tried skew_normal(), gaussian, beta distributions when I divide my variable /100, hurdle_log etc.pp.), based on the posterior checks and mean stats.

I’ve already consulted a range of resources (e.g., brms vignettes, Andrew Heiss, Solomon Kurz, various YouTube tutorials), but examples rarely show full model fitting code or troubleshooting for not working distribution. Any pointers would be greatly appreciated— the right literature, if I have missed something in the forum or sharing practical advice would truly make my month. I assume I might just not recognize the tutorials that are doing what I am looking for.

Thank you for reading this far!

fixefPrior <- c(
  set_prior("normal(0, 1)", class = "b", resp = "y"),

m_model<-  brms::bf( m~ x+ controlvariable+  (x|p| ID) + (1 | item),family = student())

y_model <- brms::bf(
  y~ x+m+ control+ (x+m|p| ID) + (1 | item),
  family = bernoulli(link = "logit"))
# Define full file path for the saved model
model_file <- file.path(subfolder, "mediation.rds")


# Fit both models jointly
if(!file.exists(
  paste0(mainpath,"/mediation.rds"))){
  fit <- brm(m_model+y_model  + set_rescor(FALSE),
             data = data, 
             prior = c(fixefPrior),
             chains = nChains,
             cores = nCores,
             iter = nIter,
             warmup = nWarmup,
             save_pars = save_pars(all = T))
  saveRDS(fit,
          file = paste0(mainpath,"mediation.rds"))
} else {
  m0_full <- readRDS(file = paste0(
    mainpath,"mediation.rds"))
}

I wonder whether the reason your model does not capture the skewness of the data has to do with your predictors. Would an interaction between x and control be plausible based on your knowledge?

Otherwise, the things you’ve tried look sensible to me. If the skewness is not conditional on your predictors, Student would not help because it is symmetric. How bad is Beta?

Is m also z-transformed in the pp_check you are showing?

(it might help others offer more feedback if you said a bit more about what you are modeling and how the observations are structured)

1 Like

Hi Amynang,
thank you for responding! I appreciate this a lot.

An interaction with the control would make sense and I also have another predictor that I could add to the model - but I thought I might not overcomplicate things in the model to start (as it is my first time fitting a bayes model). I will give both models a try.

With regards to the data (I am happy to share more info):

  • I am trying to model decision-making behavior.
  • The outcome is a decision classified in a categorical fashion (either 0 or 1, accept/decline).
  • Predictor 1 is the trial type - I vary the level of technology support available for the decision.
  • Predictor 2 (could be included in the model) I varied (per block) the moment at which participants were informed about level of technology support available for the next set of trials.
  • The control is trust in technology.
  • The mediator is a rating variable ranging from 0-100% - they rated the extent to which technology influenced their decision. I assume that this will be close to 0 (so half the trials), for the other mostly between 0-80, with likely a peak around 30-50 % (but that is more a hunch than based on any data). I dont really expect participants to indicate 100%. It is mean centered before entering it into the model.

With regards to the pp_check - I plotted the posteriors of the mediator of the fitted model. I assume that means its mean centered to.

With regards to the zero_inflated_beta , the results looked like this - I have not figured out how to set the priors for beta though. I am a bit afraid of making overly restrictive assumptions "creating my results.

Thank you!

You said your mediator m was a “rating variable ranging from 0-100%,” and that you’ve mean centered that variable. I’ve had a little experience with rating scales like that, and I’ve observed others talking about them too. It seems like their distributions are often pretty ugly, and it’s very unlikely they’re going to follow a simple parametric form. IMO, something like a Student-t is probably fine if you’re modeling the mean-centered version of m. Your initial pp_check() wasn’t spectacular, but I’ve seen many many worse.

It’s great you’re taking the model seriously, especially for your thesis. But keep the big picture in mind: You set your priors based on reasonable assumptions, and you model the data with a likelihood that’s designed to capture the major features of your data. But if you or your collaborators are inexperienced with any given phenomenon (and since you’re in grad school, your experience is limited by definition), your reasonable assumptions will surely miss a thing or two. Note the limitations, but don’t let them bog you down too much. Rather, let them fuel your next project.

2 Likes

Also, if you’re “new to brms” and this is the kind of model you’re grappling with, you’re dong great. Keep it up.

1 Like

Thank you Solomon ! That actually helps a lot to understand “how” normal this is. I just lack the experience. Also I really appreciate your kind words and encouragement.

1 Like

Hi Alice,

We have used brms for a similar case (although a bit more complicated because we were fitting a non-linear model). You can check the paper here: The rat frontal orienting field dynamically encodes value for economic decisions under risk | Nature Neuroscience

Code is here:


## brms model fitting
f_drug_phi <- bf(
  prop_lott | trials(total) ~ inv_logit(omega1) *
    (Phi_approx(
      (lottery_prob * (lottery_norm) ^ exp(logrho) -
         sb_norm ^ exp(logrho)) /
        (sqrt(2)*exp(noise)))) +
    (1-inv_logit(omega1)) * inv_logit(omega2),
  logrho ~ dose + (1 | subjid),
  noise ~ dose + (1 | subjid),
  omega1 ~ dose + (1 | subjid),
  omega2 ~ dose + (1 | subjid),
  nl = TRUE)

priors <- prior(normal(0, 0.5), nlpar = 'logrho') +
  prior(normal(3, 1), nlpar = 'omega1') +
  prior(normal(0, 1), nlpar = 'omega2') +
  prior(normal(-3, 0.3), nlpar = 'noise') +
  prior(normal(0,0.35), lb=0, class = sd, group = 'subjid', nlpar = 'omega1') +
  prior(normal(0,0.35), lb=0, class = sd, group = 'subjid', nlpar = 'omega2') +
  prior(normal(0,0.35), lb=0, class = sd, group = 'subjid', nlpar = 'noise') +
  prior(normal(0,0.35), lb=0, class = sd, group = 'subjid', nlpar = 'logrho') +
  prior(normal(0,0.5), class = b, nlpar = 'logrho', coef = 'dose') +
  prior(normal(0,0.5), class = b, nlpar = 'noise', coef = 'dose') +
  prior(normal(0,1), class = b, nlpar = 'omega1', coef = 'dose') +
  prior(normal(0,1), class = b, nlpar = 'omega2', coef = 'dose')

With hierarchical models, I would also recommend starting with synthetic data and then playing with the priors so that you can recover the generative parameters in your synthetic data with no divergences.

We have found that is necessary to specify the priors for all levels, including priors on the standard deviation for subject level parameters.

1 Like

Amazing - thank you jerlich ! Also for investing the time to looking up and post the example code and paper! Its really helpful to see other fitted models.