Calculating percentage mediation for multilevel spline mediator


#1

So a while back I was interested in replicating a mediation analysis in this paper. They have written the function specified here which uses penalised splines to estimate non-linear effects of a mediator.

In short their model looks as follows;
image

I have implemented this in brms using the following model thanks to some help from on these forums;

m ~ s(t) + s(t, by = x) + (1 | id)
y ~ s(t) + s(t, by = m) + s(t, by = x) + (1 | id) 

To calculate the percentage mediation at t you take the product of a(t)B(t). In a simple mediation analysis this is easy, multiply the coefficients. If its a linear effect then its easy, multiple t by your coefficients, e.g. (a * t) * (B * t). The problem is I have no idea how to achieve the same result with splines.

I was trying to cheat by using the marginal_effects(). I was trying to enter zeros in the conditions alongside a vector of t increments but I realise this isnt the same thing…

I would appreciate any assistance and will keep reading up on spline because I fell my lack of understanding there is whats part of the problem…

EDIT: Here is some sample output that might help;

 Family: MV(gaussian, gaussian) 
  Links: mu = identity; sigma = identity
         mu = identity; sigma = identity 
Formula: y ~ s(t) + s(t, by = m) + s(t, by = x) + (1 | id) 
         m ~ s(t) + s(t, by = x) + (1 | id) 
   Data: dat (Number of observations: 375) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Smooth Terms: 
             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sds(y_st_1)      1.20      0.92     0.22     3.48        498 1.01
sds(y_stm_1)     1.41      0.47     0.76     2.57        362 1.01
sds(y_stx_1)     1.24      0.55     0.50     2.64        388 1.01
sds(m_st_1)      3.11      1.36     1.27     6.46        436 1.01
sds(m_stx_1)     1.29      0.42     0.75     2.34        372 1.01

Group-Level Effects: 
~id (Number of levels: 25) 
                Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(y_Intercept)     0.88      0.15     0.65     1.20        264 1.00
sd(m_Intercept)     1.17      0.20     0.84     1.62        185 1.02

Population-Level Effects: 
            Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
y_Intercept     0.16      0.21    -0.23     0.60        215 1.03
m_Intercept     0.13      0.25    -0.37     0.62         87 1.04
y_st_1          1.36      0.48     0.26     2.25        346 1.01
y_st:m_1        0.55      0.17     0.21     0.92        191 1.04
y_st:m_2        0.48      0.25    -0.05     0.92        200 1.02
y_st:x_1        0.74      0.43    -0.15     1.52        214 1.02
y_st:x_2        0.47      0.54    -0.61     1.63        177 1.05
m_st_1          0.05      0.59    -1.13     1.22        665 1.00
m_st:x_1        0.46      0.05     0.36     0.56        808 1.00
m_st:x_2        0.20      0.07     0.08     0.33        614 1.01

Family Specific Parameters: 
        Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma_y     1.02      0.04     0.95     1.10       1212 1.00
sigma_m     0.96      0.04     0.89     1.04       1361 1.00

TL:DR - How do I calculate a(t)B(t) when I am using splines in the above equation ?

  • Operating System: W10
  • brms Version: 2.6

#2

What you need are S x T matrices for both a(t) and b(t), where S is the number of posterior samples and T is the number of time point t you want to evaluate, say, 1, 2, 3 , …, 100, if your minimal time in your sample is 1 and the maximal is 100 (but this depends on your data of course).

If you have those matrices, let’s call them A and B, you can compute:

C <- A * B
C_summary <- posterior_summary(C)

and then work with the results, for instance by plotting them.

So the only complexity is to get your A and B. Here is a suggestions that might be working.

Here is an example that, when translated to your model, should allow you to extract A and B

library(tidyverse)
library(brms)

# simulate some data
set.seed(0) 
dat <- mgcv::gamSim(1, n = 200, scale = 2)
fit <- brm(y ~ s(x2, by = x1), data = dat)

# fit a spline model
ms <- marginal_smooths(
  fit, smooths = "s(x2,by=x1)", 
  int_conditions = list(x1 = 1),
  spaghetti = TRUE
)

# extract the 'A' matrix belonging to spline 's(x2,by=x1)'
A <- attr(ms[[1]], "spaghetti") %>%
  select(-x1) %>%
  spread(key = "sample__", value = "estimate__") %>%
  select(-x2) %>%
  select(-effect1__) %>% # remove this line for brms <= 2.6.0
  as.matrix() %>%
  t()

#3

This is absolutely awesome, here is a brief look at an example I ran on simulated data:

  • direct effect is T(t),
  • indirect is a(t)*B(t)
  • total is direct + indirect
  • mediation_percent is indirect / total

Hats off to Paul, I feel like I have to co-author you if I use this… haha.


#4

I wouldn’t object that ;-)

One more thing: One of the main advantages of a fully Bayesian framework
is that you can propagate uncertainty to whatever quantity your are interested
in, that is also to products of splines. The output of posterior_summary
has some columns Q<x> giving you some lower and upper quantiles to use as uncertainty measures. I would strongly suggest adding those to your plots as well.


#5

Absolutely, in the first run I was trying to get anything that looked right… A terrible excuse considering its so easy to get a credible interval:

image