Computing average marginal effects for logistic regression when exploiting sufficient statistics

AlesVomacka · September 22, 2023, 8:42am

Hello,
I’m trying to compute average marginal effects for binary logistic regression using brms. There is no problem when working with the full data, but when trying run the model on aggregated data (to save on computation time), I can’t seem to get it quite right.

Here are the models and packages used:

library(dplyr)
library(palmerpenguins)
library(brms)
library(marginaleffects)

#Aggregating data
penguins_agg <- penguins |> 
  mutate(total = 1,
         male = sex == "male") |> 
  summarise(male = sum(male),
            total = sum(total),
            .by = body_mass_g)


m_full <- brm(sex ~ body_mass_g, data = penguins, family = bernoulli()) # Model on "full" dataset
m_agg <- brm(male | resp_trials(total) ~ body_mass_g, data = penguins_agg, family = binomial()) # Model on aggregated data

Both of these model return the same predicted probabilities, so they should be equivalent. However, I can’t seem to get the AMEs right. I understand that with the aggregated data, cases have to be weighted. I tried weighting by the proportion of observations in the full dataset, but that doesn’t work:

w <- penguins_agg$total / sum(penguins_agg$total) # n / N weights

avg_slopes(m_full, variables = "body_mass_g")
#       Term Estimate    2.5 %   97.5 %
# body_mass_g 0.000254 0.000203 0.000297

avg_slopes(m4, variables = "body_mass_g", wts = w)
#        Term Estimate   2.5 %  97.5 %
# body_mass_g  0.00143 0.00113 0.00169

My question is, what the correct weight should be for the AMEs to match? Thanks in advance!

desislava · September 24, 2023, 8:49pm

I’m not sure whether you can do it by weighting. An alternative is to use the original data as the newdata grid in the avg_slopes call.

Prepare the original data frame, so that it has the same columns as the aggregate data frame: male, total (column of 1s) and body_weight:

data_01s <- penguins |>
  # Create an outcome column and a predictor column
  # of the same types as in the aggregate data.
  mutate(
    male = 1L * (sex == "male"),
    total = 1L
  ) |>
  # There are some rows with NaNs;
  # drop them before aggregating.
  drop_na(
    male, body_mass_g
  )

To get avg_slopes reproduce the disaggregated marginal estimates from the aggregated fit, use the original data_01s as its newdata argument. (The model m_agg expects two columns: the predictor body_mass_g & the sample size total; that’s why we added a column of 1s called total.)

avg_slopes(m_agg, newdata = data_01s, variables = "body_mass_g")
#>         Term Estimate    2.5 %   97.5 %
#>  body_mass_g 0.000254 0.000204 0.000297

desislava · September 29, 2023, 3:19pm

Update: There are no weights w that give the correct estimate of the marginal effect with avg_slopes(m_agg, wts = w).

You can use one of these two calculations instead.

# Model `m_agg` uses weights. In the original, un-collapsed dataset
# each observation is a subset of sample size of 1.
# avg_slopes(m_agg, newdata = data_01s %>% mutate(n = 1))
#>  Term Estimate    2.5 %   97.5 %
#>     x 0.000256 0.000205 0.000297

# The sample sizes `n` get in the way of the `posterior_epred` calculation.
# So "turn those off" by setting `n = 1` and use the weights argument instead.
avg_slopes(m_agg, newdata = data_agg %>% mutate(n = 1), wts = data_agg$n)
#>  Term Estimate    2.5 %   97.5 %
#>     x 0.000256 0.000205 0.000297

More details in this marginaleffects issue which I submitted (unnecessarily, it turns out as there is no bug).

AlesVomacka · September 29, 2023, 5:29pm

Amazing, thank you very much!

Topic		Replies	Views
Calculating average marginal effects in models with random slopes brms techniques	19	4419	March 15, 2024
Avg_comparisons for brms ordinal model ususing R Modeling rstan , brms	2	76	July 21, 2024
Population-averaged estimates in brms logistic mixed model brms mixed-model	0	476	February 24, 2022
AME for predicted instead of fitted - brmsmargins Modeling techniques , brms	1	617	May 18, 2022
New blog post: using Stan/brms to get a covariate-adjusted average treatment effect with uncertainty Publicity	1	695	September 18, 2020

Computing average marginal effects for logistic regression when exploiting sufficient statistics

Related topics