Post processing of add_epred_draws from a negative binomial model to estimate incidence rate (MRP)

belay · July 7, 2023, 1:13pm

To average or not to average ‘add_epred_draws’ by draw before computing the parameter of interest by a grouping variable?

I have an aggerated data set Test_data with the following structure

Year: 1995- 2000
Regions (geographical region): 1 - 10
Gender: Female, Male
Age_group: 18-29, 30-49, 50+
n: number of cases
N: denominator
LogN: log(N) - a log transformation of the denominator N

Given this data, I fit a negative binomial regression as follow

Fit<- stan_glmer(n ~ logN + Year+ gender + age_group + (1 | Regions),
                             family = neg_binomial_2(link = "log"),
                             data = Test_data
                             iter=4000,
                             QR=TRUE,
                             cores = 4,
                             adapt_delta = 0.99)

I am interested in the Incidence rate, for example, by year, gender, age_group, and region. From the observed data, it’s given by IR = (n/N)*100000.

I have a new data called ‘Newdat’ with the same structure but without n. It only have the denominator N named as n_pop (i.e. this called a poststratification data in the MRP/ “Mister P” literature).

Using the add_epred_draws, I generate the expectation of the posterior predictive distribution given the model and the new data.

What I’m worried about is how I summarise the add_epred_draws result to get ‘marginal’ estimates, say by year and gender. Do I need first to get the mean of the draws and then summarize by group or do I have to summarise the whole draws as bellow. The former seems an artificial reduction of variability in the estimate because we average out the between draws variability and the later leads to a very wide uncertainty interval or I’m completely missing this.

group - c("Year", "gender")
res<- Newdat%>%
          add_epred_draws(model, ndraws = 1000)%>%
          group_by(across(all_of(group)))%>%
             summarise(Rate = 100000*mean (.epred/n_pop),
                       lower = 100000*quantile(.epred/n_pop, 0.025),
                       upper = 100000*quantile(.epred/n_pop, 0.975))

belay · July 8, 2023, 8:40am

@ajnafa any suggestions?

ajnafa · July 10, 2023, 3:02am

I ran into this problem with a multinomial MrP model for public opinion data a few months ago and your intuition here is correct in that option 2 is giving you more realistic estimates (for better or worse) by not ignoring a decent part of the variance.

belay · July 10, 2023, 8:25am

Thank you @ajnafa!

Topic		Replies	Views
Approach to estimating contrasts using brms model Modeling	0	32	May 15, 2025
Keeping the draw order consistent with `posterior_linpred` and `posterior_epred` rstanarm rstanarm	7	369	April 2, 2024
Unexpected results from posterior predictive draws Modeling	1	328	January 31, 2021
How to compute expected value of the posterior predictive distribution (epred) Modeling specification	17	1166	July 31, 2023
Poisson and negative binomial regression rstanarm fitting-issues , specification	8	3543	July 9, 2017

Post processing of add_epred_draws from a negative binomial model to estimate incidence rate (MRP)

Related topics