MEMs, AMEs, and their Confidence Intervals in a brms Multinomial Logistic GLMM

Thanks to the helpful folks in this thread (especially Paul), I’ve now got a multinomial GLMM to analyze, with varying intercepts for one group-level factor. There are 4 response categories c_1...c_4, with c_1 the arbitrarily chosen baseline category. The random effects are ignored at this stage of the analysis, which focuses on quantifying the population-level effects.

As most readers will know, multinomial models are notoriously hard to interpret, because unlike in binomial or linear regression, interpretation of a given coefficient for any c is non-monotonous and also depends on that same predictor’s coefficients for all the other c's.

Let x_1 be a binary predictor of interest in a multinomial model containing many predictors. Fox&Hong and Long, who are the only authors I’ve found giving detailed advice on the interpretation of multinomial models, suggest analyzing x_1's effect by fixing all the other x's at their means (if quantitative) or proportions (if factor levels), then calculating the difference between the fitted probabilities of each c when x_1 = 1 versus when x_1 = 0. The result is called the marginal effect at the mean (MEM).

The average marginal effect (AME), on the other hand, is supposedly (Long 2012: 25) computed by going through the i_1...i_n data points, once for each c, and

  1. calculating the fitted probability of c for subject i when x_1 = x_{1i}
  2. calculating the hypothetical fitted probability of c when x_{1i} is changed one unit (from 0 to 1 or vice versa)
  3. dividing the difference by the sign of the change in x_{1i}

The AME of x_1 on a given response category is the average of this difference over the entire dataset.

Does it make sense (or is it feasible) to calculate a confidence interval for the MEM or the AME? That would look real good in an effect display plot. But I haven’t seen such things anywhere. What comes closest is the effects package in R, which computes confidence intervals for the estimated category probabilities at x_1 = 1 and x_1 = 0, holding the other x's at their means. But that’s not a confidence interval for the MEM itself, is it? It’s just a pair of two confidence intervals from whose possible overlap we may infer whether the MEM differs “significantly” from zero. In particular, using these two confidence intervals, rather than a single one for the MEM itself, forces us to plot two points (namely \hat{\mu}(y = c|x_1 = 1) and \hat{\mu}(y = c|x_1 = 0)) per category per predictor, which takes a lot of plot space when there are very many predictors and categories.

I’ve written an R function that calculates a variant of AME – one that bases inference only on those pairs of predictor combinations that A) actually occur in the data and B) contain information about x_1, so nothing is extrapolated to counterfactual data. But I’m not sure how (or even if) the confidence intervals of these AMEs should be computed. ATM, it is done by setting x_1's coefficient to the upper(lower) boundary of its CI, then going through every informative pair of predictor combinations in the data and calculating each \hat{\mu}(y = c|x_1 = 1) using that upper(lower) boundary value for x_1's coefficient. There are no explicit coefficients for effects on c_1, so the confidence interval for x_1's effect on c_1 is calculated by setting the coefficients of c_2, c_3 and c_4 simultaneously to the upper limit of their 70.7% CI’s (1-\sqrt[3]{\frac{.05}{2}} = .707). But this is likely wrong, because the resulting intervals look funky:

The baseline category always has the smallest confidence interval, which hardly makes sense unless there’s an error.

Does anyone have an idea of how the CIs could be calculated correctly?

Since my own approach doesn’t seem to work so far, I also wonder if the effects() function could be made to work or replicated for brms models. The function uses something called the delta method to calculate SEs for fitted multinomial probabilities computed from a multinomial logit model fit by nnet's multinom() function. How could we calculate them for a brms multinomial GLMM? Page 22 in the aforementioned article provides the details, but as a non-STEM student I don’t understand all the mathematical notation.

As expected, the effects() function currently fails when applied to a brms model, complaining:

Error in terms.default(model) : no terms component nor attribute

From what I do understand of the cited article, the function requires access to at least the estimated beta coefficients and their covariance matrix in order to work, so if I could either:

A) figure out how to extract the necessary components from a brms multinomial model to give effects() what it needs to work, or
B) figure out how exactly the SEs of fitted probabilities from a multinomial model are calculated,

then I could probably write a function to do all that automatically. That would be an alternate solution.


Fox, J. and J. Hong (2009). Effect displays in R for multinomial and proportional-odds logit models: Extensions to the effects package. Journal of Statistical Software 32:1, 1–24,

Long, S. (2012) Regression Models for Nominal and Ordinal Outcomes. Forthcoming in Best and Wolf (editors), Regression Models , Sage Publications

I’m not sure but this may be relevant:

1 Like

I’m by far no expert on this, but I recently stumbled upon a similar problem (for Logistic and Gamma regressions though).

There are really two things going on here:

  1. The partial derivative of a predictor with respect to the outcome is a function of the other predictors when using a link function — that’s why we need to fix them. Whether the MEM or AME is reasonable depends on the application, but ultimately you should be looking into what’s interesting for your problem (and it seems that’s what you’re doing with your AME variant).
  2. How to do inference here? Well, maybe I’m completely mislead, but I would approach it like this: Let’s say you have 2,000 post-warmup iterations, and thus 2,000 draws from the posterior. Now you can calculate for each draw your variant of the AME. In the end you’ll have a vector of length 2,000 (let’s call it myAME and you can use this to extract summary statistics, for example (in R) quantile(myAME, c(0.05, 0.5, 0.95)) should give you the median and upper/lower 90% credibility intervals.

I hope this makes sense…


1 Like

Did you try out the marginal_effects method of brms already?

Edit: This was stupid since marginal_effects does not suppoer categorical models…

I wouldn’t try to get effects() working since it is entirely frequentist (as far as I understand) which would lead to huge information loss even if you could make it work on brms models somehow. Have you tried out the fitted() method (with summary = FALSE) already? It will surely help you getting the expected probiliites of all categories given certain predictor values.

Yes sir. And in this case, not just the magnitude but even the sign of that derivative may change depending on the values of the other covariates.

This does make a world of sense to me. Unfortunately it would take about 100 hours on my system to run the program on every posterior sample (there are 4000). But it’s still a potential solution. So – thank you.

Wow. Yes, it appears that using fitted() with appropriate settings essentially solves the problem. You win again, sir.

You could also try the ggeffects-package, which gives you the marginal effects for multinomial models fitted with brms.

1 Like

strengejacke: That package sounds great, but doesn’t actually seem to work with brms multinomial models:

> ggpredict(mod, terms = "x1", ci.lvl = 0.95, type = "re")
Note: uncertainty of error terms are not taken into account. You may want to use `rstantools::posterior_predict()`. Error in `$<`(`*tmp*`, "predicted", value = c(0.249904418608922, : replacement has 8 rows, data has 2

Will likely be useful once the bugs are fixed though.

Ok, thanks, I’ll look into it. Is there a reproducible example?

1 Like
study <- read.dta("")
mod <- brm(prog ~ ses + female + read + write, family = categorical(link = "logit"), data = study)
ggpredict(mod, terms = "write", type = "fe")
Following variables had many unique values and were prettified: write. Use `pretty = FALSE` to get smoother plots with all values, however, at the cost of increased memory usage.
Note: uncertainty of error terms are not taken into account. You may want to use `rstantools::posterior_predict()`.
Error in `$<`(`*tmp*`, "predicted", value = c(0.335793236634429,  : 
  replacement has 27 rows, data has 9

Ok, cumulative families worked with ggpredict(), not categorical. However, this is now implemented, you just need to install the latest version from GitHub.

ggpredict(mod, terms = "write", type = "fe") %>% plot()

ggpredict(mod, terms = "ses", type = "fe") %>% plot()

With my own model, which includes one group-level effect, I get the error message:

Error in eval(substitute(expr), data, enclos = parent.frame()) : object 'prior.weights' not found

What might be causing this?

Can’t say for sure, I guess I need some more information (maybe a reproducible example)? Please post at the GitHub-repo, I think it’s off-topic here (