Mediation when using Gaussian and Cumulative probit models

Let’s say you have a basic mediation model, but your mediator M you wish to model with a Gaussian family and your outcome Y with a cumulative probit family. Something like this in brms:

bf_med <- bf(M ~ X) + gaussian()
bf_out <- bf(Y ~ M + X) + cumulative(probit)
m1 <- brm(bf_med + bf_out + set_rescor(FALSE), data=d)

If both models were Gaussian, to find your indirect effect, you would simply multiply the posterior samples for the coefficient of X in the mediator model by the posterior samples for the coefficient of M in the outcome model.

If M is centered and scaled by 1 standard deviation to make it a standard normal, can I still multiply as described above even though different family and link functions are used for the two models?? I would think that I could because they should be on the same scale, as the cumulative probit model assumes that the outcome comes from a latent standard normal distribution, and both coefficients would have the same interpretation (although the effect of M on Y is for the latent distribution). Am I correct in thinking I can simply multiply the coefficients?

I could be mistaken, but I’m pretty sure this issue only arises if the mediator is not Gaussian. So I don’t think you even need to rescale the mediator to be standard normal.

To put it another way, the mediated pathway from X to the latent scale of Y is still linearly from X to M and then from M to Y. So multiplying the two coefficients should still get you the indirect effect to the latent scale.

Thanks! Yeah, I think I see what you are saying. That makes sense for this case.

Follow-up: I realized I have both scenarios in my data. So, I also have a mediator that is ordinal Likert scale item and an outcome Y that can be modeled with a Gaussian distribution. So something like this in brms:

bf_med <- bf(M ~ X) + cumulative(probit)
bf_out <- bf(Y ~ M + X) + gaussian()
m1 <- brm(bf_med + bf_out + set_rescor(FALSE), data=d)

So in this scenario, the interpretation of ‘a’, the coefficient for the effect of X on M, is for a one unit increase in X on the latent standard normal distribution for M (let’s call that latent standard normal distribution, M_hat). But the interpretation of ‘b’, the coefficient for the effect of M on Y, is for a one unit increase in the Likert scale item M on Y. So it would seem the coefficients are on different scales now, because what I really want for the bf_out model is not M as a predictor, but rather M_hat, as that is the scale for the coefficient for X in the bf_med model.
To put them on the same scale, should I use a standardized M variable in the bf_out model instead of the Likert scale M variable?? So it would look like:

bf_med <- bf(M ~ X) + cumulative(probit)
bf_out <- bf(Y ~ M_centered_scaled + X) + gaussian()
m1 <- brm(bf_med + bf_out + set_rescor(FALSE), data=d)

If so, do I need to set the thresholds to 'equidistant` in the cumulative probit model?

edit - tagging anyone who might work more with ordinal and mediation models than I do @Solomon @matti @strengejacke

1 Like

This is a good question. Sadly, I haven’t seriously contended with mediation models with multiple likelihoods. I’ll put the horn out on twitter.

1 Like

Hi, can you make and share a simulation code that would generate data that resembles your data? I may have a nice solution for this, but as I don’t have evidence that it works also in this case, I would like to test it before recommending using it. If you can share a simulation, I can get back to you next week. In any case, your post had an excellent timing as we’ve been thinking related mediation models recently.


A similar question came up previously and I put together an answer for it here, hope it helps!


Would you still suggest this approach for cumulative probit models? In this case the outcome of the first path X -> M isn’t a probability but in units of standard deviation of the latent distribution for M, M_hat. Are you suggesting the same approach, but in this case the standard deviation of M when standardizing the X -> M pathway would just be 1? This would simplify to b_xm * SD_x. And the standard deviation of M for the M - > Y pathway would be the standard deviation of the Likert variable?

I don’t have any at the moment. I may be able to come up with something. thanks

That’s right, you would use the same approach but simply calculate the SD for the x -> m path differently

Because the variance for a probit link is fixed to 1, the formula is:

SD_m = \sqrt{b_{xm}^2 + SD_x^2 + 1}

Interesting! I hadn’t seen this line of work. Would you say that the standardized approach estimates sample average indirect effects? My naïve assumption was always that you had to marginalize over the change in M conditional on some specific change in X, as in the attached Stan model and visualized in the plot below.

mediation_model.stan (2.2 KB)
mediation_op.R (1.9 KB)

I used @simonbrauer excellent simulation data ( @avehtari ), and I applied @andrjohns approach using brms. Here is the code:


###Data simulation
n_obs <- 100
x_alpha <- 0
x_sigma <- 1

mx_beta <- 0.5
m_tau <- c(-1, 0, 1)

y_alpha <- 0
yx_beta <- 0.5
ym_beta <- 0.25
y_sigma <- 1

# Ordered probit random draws
rop <- function(x, tau){
  op <- vector(mode = 'numeric', length = length(x))
  n_levels <- length(tau) + 1
  probs <- rep(NA, times = n_levels + 1)
  probs[1] <- 0
  probs[n_levels + 1] <- 1

  for(n in 1:length(x)){
    probs[2:n_levels] <- pnorm(tau - x[n])

    op[n] <- sample(1:n_levels, size = 1, prob = probs[2:(n_levels + 1)] - probs[1:n_levels])


sim_df <- data.frame(row = 1:n_obs) %>%
  mutate(x = rnorm(n(), x_alpha, x_sigma),
         m = rop(mx_beta * x, m_tau),
         y = rnorm(n(), y_alpha + yx_beta * x + ym_beta * m, y_sigma))

###brms model
bf_med <- bf(m ~ x) + cumulative(probit)
bf_out <- bf(y ~ m + x) + gaussian()
m1 <- brm(bf_med + bf_out + set_rescor(FALSE), data=sim_df, cores=4)

###posterior samples and compute the indirect effect (mediation effect) a*b, as per Andrew Johnson approach
s1 <- as_draws_df(m1) 

sd_x <- sd(sim_df$x)
sd_y <- sd(sim_df$y)
sd_m <- sd(sim_df$m)

#calculate standardized a, the standardized x -> m pathway
a <- s1$b_m_x * sd_x / ( sqrt( s1$b_m_x^2 + sd_x^2 + 1 ) ) 

#calculate standardized b, the standardized m -> y pathway
b <- s1$b_y_m * sd_m / sd_y

#calculate a*b, the indirect effect (mediation effect)
me <- a*b
quantile(me, c(0.025, 0.975))

Mediation effect a*b was estimated to be 0.09 (0.03 - 0.16) for @simonbrauer simulation data (assuming nothing wrong in my code above).

It seems like it would be better to simulate the effect of m_hat on y, rather than the effect of m on y, because without that, I don’t know the programmed mediation effect of the simulation, right? It seems like one would want to simulate y from something like y_alpha + yx_beta*x + ymhat_beta*m_hat

1 Like