Calculating average marginal effects in models with random slopes

I think this is a conceptual issue on my end, but it also could be an issue related to fit().
I am working on a large, individual participant data meta-analysis and want to estimate constrained longitudinal analyses with random slopes by trial.

Some outcomes are continuous, so the default posterior summaries of coefficients estimated from brms are great. Other outcomes are count, so I wanted to calculate average marginal effects (AMEs) for change from baseline to post intervention in control and treatment conditions, and their difference.

For simplicity, suppose the goal was the AME for the change from baseline to post in the control group. My naive approach was to take the model frame, and then generate predictions on the response scale under some counterfactual conditions:

  • set all observations as in the control group at baseline
  • set all observations as in the control group at post
  • calculate the difference between those two sets of predictions, average across participants, and then summarize

Here is a synthetic example using the small mtcars dataset built into R.

dummy <- mtcars
dummy$Time <- mtcars$am
dummy$Tx <- mtcars$vs

m1 <- brm(mpg ~ Time + Time:Tx + (1 | cyl), data = dummy, cores = 4)

mf <- model.frame(m1)
mf <- mf
mf$Tx <- 0
mf$Time <- 0
yhat00 <- fitted(m1, newdata = mf, scale = "response", 
                 re_formula = NULL, summary = FALSE)
mf$Time <- 1
yhat01 <- fitted(m1, newdata = mf, scale = "response", 
                 re_formula = NULL, summary = FALSE)

> bayestestR::describe_posterior(m1)
Summary of Posterior Distribution

Parameter   | Median |         95% CI |     pd |          ROPE | % in ROPE |  Rhat |     ESS
--------------------------------------------------------------------------------------------
(Intercept) |  18.70 | [13.63, 23.93] |   100% | [-0.60, 0.60] |        0% | 1.000 | 1278.00
Time        |   1.37 | [-1.48,  4.27] | 83.15% | [-0.60, 0.60] |    21.10% | 1.001 | 2538.00
Time:Tx     |   4.24 | [ 0.01,  8.76] | 97.28% | [-0.60, 0.60] |     2.32% |       |        
> bayestestR::describe_posterior(rowMeans(yhat01 - yhat00))
Summary of Posterior Distribution

Parameter | Median |        95% CI |     pd |          ROPE | % in ROPE
-----------------------------------------------------------------------
Posterior |   1.37 | [-1.48, 4.27] | 83.15% | [-0.10, 0.10] |     3.47%

With a random intercept only, the approach appears fine. In the linear case, the estimate from my differences is 1.37, which matches the Time population parameter of 1.37.

When there is a random slope, however, these no longer align.

m2 <- brm(mpg ~ Time + Time:Tx + (1 + Time + Time:Tx | cyl), data = dummy, cores = 4)

mf <- model.frame(m2)
mf <- mf
mf$Tx <- 0
mf$Time <- 0
yhat00 <- fitted(m2, newdata = mf, scale = "response", 
                 re_formula = NULL, summary = FALSE)
mf$Time <- 1
yhat01 <- fitted(m2, newdata = mf, scale = "response", 
                 re_formula = NULL, summary = FALSE)


> bayestestR::describe_posterior(m2)
Summary of Posterior Distribution

Parameter   | Median |          95% CI |     pd |          ROPE | % in ROPE |  Rhat |     ESS
---------------------------------------------------------------------------------------------
(Intercept) |  18.61 | [ 13.57, 24.39] | 99.98% | [-0.60, 0.60] |        0% | 1.000 | 1240.00
Time        |   1.53 | [ -3.00,  6.12] | 78.05% | [-0.60, 0.60] |    19.44% | 1.001 | 1031.00
Time:Tx     |   3.45 | [-15.57, 15.43] | 75.45% | [-0.60, 0.60] |     6.50% |       |        
> bayestestR::describe_posterior(rowMeans(yhat01 - yhat00))
Summary of Posterior Distribution

Parameter | Median |        95% CI |     pd |          ROPE | % in ROPE
-----------------------------------------------------------------------
Posterior |   1.46 | [-1.50, 4.79] | 82.85% | [-0.10, 0.10] |     3.50%

1.46 is not 1.53. What concerns me more than these minor variations in the central tendency, is that my approach of extract fitted values, average across people, and then summarize the posterior seems to generate consistently too narrow CIs when there is a random slope involved.

Obviously with a linear model and identity link, what I am doing is pointless. But for my other outcomes using non linear link functions, the AMEs will differ from the parameter estimates.

I think I have a conceptual issue in how I define / calculate the AMEs, but searching the general internet and google scholar for methods papers discussing this and how to solve it have not yielded anything productive.

I’d very much appreciate any suggestions for readings, or search terms/key words, or alternate code approaches.

  • Operating System: Win 10 pro
  • brms Version: 2.16.1
1 Like

Perhaps the emmeans package is what you are looking for. In version 2.16, brms has drastically extended emmeans support to now work with almost all models.

Thanks for your response, Paul. The extended support for emmeans is very helpful in many instances.

Regrettably, it is not quite what I’m after in this case. emmeans() estimates marginal effects at the means (MEMs) and not average marginal effects (AMEs). It works wonderfully in the case of linear models with identity link functions, where AMEs and MEMs align.

Here is some example data with a binary outcome modeled using a Bernoulli family and classic link function. By creation, exactly 20% of the control group (Tx = 0) experience the outcome and 80% of the treatment (Tx = 1) group experience the outcome, say disease remission.
There also is a continuous covariate, x .

set.seed(1234)
logitd <- data.frame(
  Tx = rep(0:1, each = 50),
  ybin = c(rep(0:1, c(40,10)),
           rep(0:1, c(10,40))))
logitd$x <- rnorm(100, mean = logitd$ybin, sd = 2)

Here is a brm model and the emmeans solution:

mbin <- brm(ybin ~ Tx + x, data = logitd, family = bernoulli())

emmeans(mbin, ~ Tx, transform = "response")

 Tx response lower.HPD upper.HPD
  0    0.233     0.109     0.365
  1    0.783     0.651     0.893

Point estimate displayed: median 
HPD interval probability: 0.95 

We can reproduce this manually via:

yhat <- fitted(mbin,
  data.frame(Tx = c(0, 1), x = mean(logitd$x)),
  scale = "response", summary = FALSE)

> describe_posterior(yhat[,1])
Summary of Posterior Distribution

Parameter | Median |       95% CI |   pd |          ROPE | % in ROPE
--------------------------------------------------------------------
Posterior |   0.23 | [0.11, 0.37] | 100% | [-0.10, 0.10] |        0%
> describe_posterior(yhat[,2])
Summary of Posterior Distribution

Parameter | Median |       95% CI |   pd |          ROPE | % in ROPE
--------------------------------------------------------------------
Posterior |   0.78 | [0.65, 0.89] | 100% | [-0.10, 0.10] |        0%

The issue here is that the covariate, x was averaged prior to generating predicting probabilities and marginalizing. This answers the question:

What is the average predicted probability of treatment remission for a person with average covariate values who is assigned to the control or to the treatment group?

That is a useful question and useful answer, but it does not answer what is a more relevant question:

What is the average predicted probability of remission if we gave the whole population the control or the treatment?

This can have practical implications. Using the emmeans() approach, we would say, for a population at our sample average of all covariates, an additional 54.6 % [35.0%, 70.8%] will have disease remission if we give the treatment to everyone than if we gave the control to everyone.

pairs(emmeans(mbin, ~ Tx, transform = "response"))
 contrast estimate lower.HPD upper.HPD
 0 - 1      -0.546    -0.708     -0.35

Point estimate displayed: median 
HPD interval probability: 0.95 

Using the average marginal effect approach, we would say, for a population with the observed covariate distribution of our sample, an additional 49.0 % [32.0%, 68.0%] will have disease remission if we give the treatment to everyone than if we gave the control to everyone.

mf <- model.frame(logitd)
mf$Tx <- 0
yhat0b <- fitted(mbin,
                mf,
                scale = "response", summary = FALSE)
mf$Tx <- 1
yhat1b <- fitted(mbin,
                mf,
                scale = "response", summary = FALSE)

describe_posterior(rowMeans(yhat0b))
describe_posterior(rowMeans(yhat1b))

## marginal treatment effect
describe_posterior(rowMeans(yhat1b - yhat0b))
Summary of Posterior Distribution

Parameter | Median |       95% CI |   pd |          ROPE | % in ROPE
--------------------------------------------------------------------
Posterior |   0.49 | [0.32, 0.68] | 100% | [-0.10, 0.10] |        0%

The latter is arguably more important because in cases where there is considerable heterogeneity in the population, what we really want to know is not the additional remissions in a very specific set of ‘average’ people but rather what the overall population health would look like if we invested in the treatment for everyone.

As I’ve continued thinking and reading about this, I’m starting to wonder whether I never did anything wrong in the first place, but just that my assumption that the population average affect would match the “brms Population-Level Effects” in the case of a linear mixed model with identity link was mistaken, although I do find it unnerving that the population average seems to have much lower uncertainty than the “brms Population-Level Effects”.

2 Likes

I see. With the posterior draws that brms gives emmeans access to, the latter should be able (in theory) to also compute AMEs (at least for Bayesian models). Could be a worthy addition perhaps? Tagging @rvlenth

With regard to or original question, I also have to think about it more, but I can already say that I would have also chosen the fitted approach you put forward in the OP and I don’t see why this should be wrong at the moment.

1 Like

Thanks @paul.buerkner I appreciate your reply and glad to know that you would have used fitted() as well. Seems similar in spirit to elsewhere on these forms — eg Inferences marginal of random effects in multi-level models.

Since this approach is correct as far as I can tell for single level models and is ?maybe? okay with random effects, I’ve thrown together a rough draft package with functions to help implement it on here. It can be installed via:
devtools::install_github("JWiley/brmsmargins")
@rvlenth if you’d like to implement anything along these lines in emmeans I’d be happy to chat or please feel free to any of my code

1 Like

Hi @Joshua_Wiley, I have some code to calculate AMEs for brms objects here: BRMS AMEs · GitHub

The code is basically a port of Thomas Leeper’s margins package. You can also change the re_formula settings to suit your need.

The code can take a long time to run for large data sets. I’d also be keen to know if emmeans can do this or if we can incorporate AMEs into brms/posterior in the future

1 Like

@jackbailey Thanks, I took a look and like your approach of doing the calculations and weighting by unique predictor combos to save time. brmsmargins matches your code in cases where they both work.
Although the original uncertainty around whether these are accurate remains for me.

dummy <- mtcars
dummy$Time <- mtcars$am
m3 <- brm(mpg ~ Time + (1 + Time | cyl), data = dummy, cores = 4)
summary(m3)

which gives:

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: mpg ~ Time + (1 + Time | cyl) 
   Data: dummy (Number of observations: 32) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Group-Level Effects: 
~cyl (Number of levels: 3) 
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)           5.23      2.84     1.91    12.13 1.00     1783     1967
sd(Time)                3.12      2.40     0.13     9.07 1.00     1387     1677
cor(Intercept,Time)     0.30      0.51    -0.80     0.98 1.00     2404     1988

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    18.86      2.77    13.13    24.55 1.00     1353     1568
Time          2.79      2.45    -2.04     7.88 1.00     1852     1738

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     3.12      0.43     2.42     4.11 1.00     3105     2706

and the AME for Time

time_ame1 <- bayes_dydx.default(m3, variable = "Time", stepsize = 1)
time_ame1$est %>% quantile(probs = c(.5, .025, .975))

which gives:

       50%       2.5%      97.5% 
 2.5753360 -0.1291799  5.3944124 

or

time_ame2 <- brmsmargins(m3,
   add = data.frame(Time = c(0, 1)),
   CI = .95, CIType = "ETI",
   contrasts = matrix(c(-1, 1)))

time_ame2$ContrastSummary

which gives:

          M      Mdn         LL       UL PercentROPE PercentMID   CI CIType
1: 2.600315 2.575336 -0.1291799 5.394412          NA         NA 0.95    ETI
   ROPE  MID      Label
1: <NA> <NA> Contrast_1

To me a notable difference is the population estimate straight from brms is:
for Time: 2.79 [-2.04, 7.88]
compared to the AME for Time: 2.58 [-0.13, 5.39]
with the latter having considerably less uncertainty

On a separate note, if you’re interested in various marginal estimates after brms models, you may find some use in the additional capacity of brmsmargins

dummy <- mtcars
dummy$Time <- mtcars$am
dummy$Tx <- mtcars$vs
m2 <- brm(mpg ~ Time + Time:Tx + (1 + Time + Time:Tx | cyl), data = dummy, cores = 4)
summary(m2)

time_ame2 <- brmsmargins(m2,
            at = expand.grid(Time = 0:1, Tx = 0:1),
            CI = .95, CIType = "ETI",
            contrasts = cbind(
              "Time 0 v 1: Ctrl" = c(-1, +1, 0, 0),
              "Time 0 v 1: Tx"   = c(0, 0, -1, +1),
              "Tx v Ctrl: Time 0" = c(-1, 0, +1, 0),
              "Tx v Ctrl: Time 1" = c(0, -1, 0, +1)))

time_ame2$ContrastSummary

which gives:

          M      Mdn        LL        UL PercentROPE PercentMID   CI CIType
1: 1.494765 1.470411 -1.666221  4.573508          NA         NA 0.95    ETI
2: 5.146171 5.184606 -7.693666 17.631245          NA         NA 0.95    ETI
3: 0.000000 0.000000  0.000000  0.000000          NA         NA 0.95    ETI
4: 3.651406 3.751300 -9.708911 15.813571          NA         NA 0.95    ETI
   ROPE  MID             Label
1: <NA> <NA>  Time 0 v 1: Ctrl
2: <NA> <NA>    Time 0 v 1: Tx
3: <NA> <NA> Tx v Ctrl: Time 0
4: <NA> <NA> Tx v Ctrl: Time 1

these are counterfactuals so what if the population was all given the control treatment at baseline, etc. Also obviously it requires some more manual specification of desired contrasts to be applied. There also is a subset argument, which I use in longitudinal studies as I do not necessarily want to upweight the sample distribution of values in people who completed more time points in a longitudinal study. Anyway see
here if interested.

1 Like

@Joshua_Wiley and @jackbailey, this is a most informative series of posts on moving beyond MEMs to AMEs. The brmsmargins package is especially helpful for me. For the convenience of others:

remotes::install_github("JWiley/brmsmargins")

1 Like

It was not Bayesian related when I posted this, but I actually had a very similar question which I posted on stackexchange regarding G-computation. I found that emmeans can only do G comp (on frequentist linear identity models) if you put weights = proportional: causality - Is the emmeans R package performing causal inference G-computation? - Cross Validated.

I presume you are basically trying to do G comp in a Bayesian setting?

I do agree this would be a cool addition to emmeans in general

1 Like

This looks great. Some things to consider as you extend it: other likelihood functions and how to handle brms-specific functions like monotonic effects, etc.

1 Like

@jackbailey Thanks for the suggestions. Could you give a bit more context / examples?

I am not too sure what you mean by other likelihood functions

I’ve never used monotonic effects in brms, but I found the vignette:
https://cran.r-project.org/web/packages/brms/vignettes/brms_monotonic.html

Based on that, I tried out this code and seems to work?

library(brms)
library(brmsmargins)

income_options <- c("below_20", "20_to_40", "40_to_100", "greater_100")
income <- factor(sample(income_options, 100, TRUE), 
                 levels = income_options, ordered = TRUE)
mean_ls <- c(30, 60, 70, 75)
ls <- mean_ls[income] + rnorm(100, sd = 7)
dat <- data.frame(income, ls)

fit1 <- brm(ls ~ mo(income), data = dat)

marg <- brmsmargins(
  fit1,
  at = data.frame(
    income = factor(levels(dat$income), ordered = TRUE)),
  ## sequential, pairwise contrasts
  contrasts = matrix(c(rep(c(-1, 1, 0, 0, 0), 2),-1, 1), 4))


## these should be the same
conditional_effects(fit1, plot = FALSE)$income

##        income       ls cond__   effect1__ estimate__     se__  lower__  upper__
## 1    below_20 57.40319      1    below_20   29.46575 1.181903 27.16700 32.00974
## 2    20_to_40 57.40319      1    20_to_40   60.45641 1.154384 58.16923 62.78298
## 3   40_to_100 57.40319      1   40_to_100   71.00420 1.240089 68.53623 73.33976
## 4 greater_100 57.40319      1 greater_100   74.39849 1.364693 71.94436 77.12927

marg$Summary
##           M      Mdn       LL       UL PercentROPE PercentMID   CI CIType ROPE
## 1: 29.49498 29.46575 26.71083 32.88775          NA         NA 0.99    HDI <NA>
## 2: 60.46622 60.45641 57.47392 63.50555          NA         NA 0.99    HDI <NA>
## 3: 70.98177 71.00420 67.80697 74.09095          NA         NA 0.99    HDI <NA>
## 4: 74.43314 74.39849 70.99505 77.74676          NA         NA 0.99    HDI <NA>

## these give the pairwise contrasts
marg$ContrastSummary

##            M       Mdn         LL        UL PercentROPE PercentMID   CI CIType
## 1: 30.971235 30.997266 26.3899437 34.851657          NA         NA 0.99    HDI
## 2: 10.515554 10.507766  6.1273547 14.900275          NA         NA 0.99    HDI
## 3:  3.451367  3.348807  0.0188492  7.900816          NA         NA 0.99    HDI
##    ROPE  MID      Label
## 1: <NA> <NA> Contrast_1
## 2: <NA> <NA> Contrast_2
## 3: <NA> <NA> Contrast_3

@Vattaka Thanks that may be helpful. I’ve downloaded Hernan’s book and will read. I’m not familiar with G-computation yet, but am certainly interested in causal estimates so it seems worth knowing. This will take me awhile to digest, but I’ll post back in this thread of G-computation is what I’m after, once I wrap my ahead around it :)

Monotonic effects looks good!

I meant, for example, computing marginal effects for likelihoods that do not have continuous outcomes. E.g. categorical models, dirichlet models, etc.

I did a first pass at computing them for categorical models here:

1 Like

Ah that makes sense, yes will work on that. It is fine for things like Bernoulli outcomes, etc. but not 3 or more level categorical variables.
I’m not familiar with dirichlet models, but will take a look.

Dirichlet models aren’t that bad. Basically just a beta regression with multiple categories, all of which sum to 1. E.g. vote shares.

1 Like

Well, this has remained a thorn in my side. I’ve begun some simulations to try to evaluate performance. The code for the simulation is below. It is not the most elegant. In short, I created a synthetic population of 10,000 people with a random intercept and random slope. Then drew random samples of 100 people from this and fit models, and repeated this 2,000 times. The synthetic population was drawn with an average slope of 2.
Using default priors in brms we would expect the 95% credible interval to include 2 about 95% of the time.

I use brms population table for the brms estimates and intervals and then the average marginal effect, calculated using fitted(). The point estimates are largely comparable with minimal average bias, average absolute differences. The population table from brms intervals included the value “2” 97.5% of the time whereas the averages from fitted() did 96.4% of the time, and expectedly the interval width was narrower (1.84) for the marginal estimates from fitted() compared to brms (1.98).

> test[, Correct := LL <= 2 & UL >= 2]

> test[, mean(Est - 2), by = Type] ## average bias
       Type         V1
1:     brms 0.02351602
2: marginal 0.02178421

> test[, mean(abs(Est - 2)), by = Type] ## average absolute differene
       Type        V1
1:     brms 0.3449325
2: marginal 0.3453282

> test[, mean(Correct), by = Type] ## percent of 95% CI intervals including pop value
       Type     V1
1:     brms 0.9750
2: marginal 0.9635

> test[, mean(UL - LL), by = Type] ## mean CI width
       Type       V1
1:     brms 1.983567
2: marginal 1.835438

I remain confused about why the CI width is narrower when using fitted(), but at least under this one very restrictive conditions, the approach seems to be performing adequately.

Because I only created a synthetic population with 10,000 people, the “true” value in that synthetic population was actually 2.018554, not 2. Using the “population” value from the synthetic population yielded largely similar results.

> test[, Correct := LL <= 2.018554 & UL >= 2.018554]

> test[, mean(Est - 2.018554), by = Type]
       Type          V1
1:     brms 0.004962019
2: marginal 0.003230207

> test[, mean(abs(Est - 2.018554)), by = Type]
       Type        V1
1:     brms 0.3442757
2: marginal 0.3447688

> test[, mean(Correct), by = Type]
       Type     V1
1:     brms 0.9760
2: marginal 0.9635

> test[, mean(UL - LL), by = Type]
       Type       V1
1:     brms 1.983567
2: marginal 1.835438

Here is the code for the simulation. I aim to run some more conditions and will include some of the findings in the brmsmargins package vignettes / README.

library(distr6)
library(brms)
library(brmsmargins)
library(data.table)
## library(lme4)

library(future)
library(doFuture)
library(foreach)
library(doRNG)

n.b0 <- distr6::Normal$new(mean = 50, sd = 10)
n.b1 <- distr6::Normal$new(mean =  2, sd =  3)

set.seed(1234)
x.b0 <- n.b0$rand(10000)
x.b1 <- n.b1$rand(10000)

mu.0 <- x.b0 + x.b1 * 0
mu.1 <- x.b0 + x.b1 * 1

n <- distr6::Normal$new(mean = 0, sd = 5)

set.seed(1234)
y.0a <- n$rand(10000) + mu.0
y.0b <- n$rand(10000) + mu.0
y.1a <- n$rand(10000) + mu.1
y.1b <- n$rand(10000) + mu.1

d <- data.table(
  ID = rep(1:10000, times = 4),
  y = c(y.0a, y.0b, y.1a, y.1b),
  x = rep(c(0, 0, 1, 1), each = 10000))

d[, .(Diff = mean(y[x == 1]) - mean(y[x == 0])), by = ID][, .(MDiff = mean(Diff))]

## m0 <- lme4::lmer(y ~ x + (1 + x | ID), data = d)
## summary(m0)

set.seed(1234)
IDs <- sample(1:10000, size = 100, replace = FALSE)

m1 <- brm(y ~ x + (1 + x | ID),
          data = d[ID %in% IDs], chains = 2, cores = 2,
          iter = 4000, refresh = 0,
          backend = "cmdstanr")
summary(m1)

ame1 <- brmsmargins(m1,
   add = data.frame(x = c(0, 1)),
   CI = .95, CIType = "ETI",
   contrasts = matrix(c(-1, 1)),
   subset = "!duplicated(ID)")

ame1$ContrastSummary

registerDoFuture()
plan(multisession, workers = 20)

set.seed(12345)
res <- foreach(i = 1:2000, .combine = c) %dorng% {
  IDs <- sample(1:10000, size = 100, replace = FALSE)
  mfit <- update(m1, newdata = d[ID %in% IDs], refresh = 0, silent = 2)

  ame <- brmsmargins(mfit,
                     add = data.frame(x = c(0, 1)),
                     CI = .95, CIType = "ETI",
                     contrasts = matrix(c(-1, 1)),
                     subset = "!duplicated(ID)")

  fe.use <- as.data.table(as.list(fixef(mfit)[2, c(1, 3, 4)]))
  ame.use <- as.data.table(ame$ContrastSummary[, c("Mdn", "LL", "UL")])

  colnames(fe.use) <- colnames(ame.use) <- c("Est", "LL", "UL")

  list(
    rbind(
      cbind(Type = "brms", run = i, fe.use),
      cbind(Type = "marginal", run = i, ame.use)))
}

test <- do.call(rbind, res)