Post processing of add_epred_draws from a negative binomial model to estimate incidence rate (MRP)

To average or not to average ‘add_epred_draws’ by draw before computing the parameter of interest by a grouping variable?

I have an aggerated data set Test_data with the following structure

  • Year: 1995- 2000

  • Regions (geographical region): 1 - 10

  • Gender: Female, Male

  • Age_group: 18-29, 30-49, 50+

  • n: number of cases

  • N: denominator

  • LogN: log(N) - a log transformation of the denominator N

Given this data, I fit a negative binomial regression as follow

Fit<- stan_glmer(n ~ logN + Year+ gender + age_group + (1 | Regions),
                             family = neg_binomial_2(link = "log"),
                             data = Test_data
                             cores = 4,
                             adapt_delta = 0.99)

I am interested in the Incidence rate, for example, by year, gender, age_group, and region. From the observed data, it’s given by IR = (n/N)*100000.

I have a new data called ‘Newdat’ with the same structure but without n. It only have the denominator N named as n_pop (i.e. this called a poststratification data in the MRP/ “Mister P” literature).

Using the add_epred_draws, I generate the expectation of the posterior predictive distribution given the model and the new data.

What I’m worried about is how I summarise the add_epred_draws result to get ‘marginal’ estimates, say by year and gender. Do I need first to get the mean of the draws and then summarize by group or do I have to summarise the whole draws as bellow. The former seems an artificial reduction of variability in the estimate because we average out the between draws variability and the later leads to a very wide uncertainty interval or I’m completely missing this.

group - c("Year", "gender")
res<- Newdat%>%
          add_epred_draws(model, ndraws = 1000)%>%
             summarise(Rate = 100000*mean (.epred/n_pop),
                       lower = 100000*quantile(.epred/n_pop, 0.025),
                       upper = 100000*quantile(.epred/n_pop, 0.975))

@ajnafa any suggestions?

I ran into this problem with a multinomial MrP model for public opinion data a few months ago and your intuition here is correct in that option 2 is giving you more realistic estimates (for better or worse) by not ignoring a decent part of the variance.

Thank you @ajnafa!

1 Like