Reproduce posterior_predict from posterior_linpred for a neg binom spline model

Hello, I would like to understand better the difference between posterior_predict and posterior_linpred, so I was trying to reproduce the results of the first from the second. Of course I am not able to.
In my case the model is a negative binomial spline model with offsets. if I compare the medians of the posterior_linpred with transform = TRUE and posterior_predict (with the same offset vector) I get that, on average, the predictions of _predict are the 0.93 of those of _linpreds.

Why is that?

Hi Angelo, here’s a brief explanation of the difference. Suppose you have a negative binomial regression with a log link:

\eta_i = a + bx_i \\ \lambda_i = \exp(\eta_i) \\ y_i \sim {\rm NegBinom}(\lambda_i, \phi)
  • posterior_linpred() gives you posterior draws of \eta.
  • posterior_linpred(transform=TRUE), or equivalently posterior_epred(), gives you posterior draws of \lambda, that is, it does the inverse link transformation for you.
  • posterior_predict() gives you draws from {\rm NegBinom}(\lambda, \phi), that is it uses a NegBinom random number generator and includes the dispersion information in \phi.

What do you get if you compare the means instead of the medians?

8 Likes

Thank you jonah! that’s what I thought, but even trying to apply the model I can’t reproduce the results. Here i extract the median prediction via linpred, via linpred passed through a negative binomial and via predict:

> posterior_linpred(mod.spline, offset = rep(log(10^5), 36), transform = T) %>% apply(2, median) %>% head() %>% round
 1  2  3  4  5  6 
 9  3 11  3 13  4 
> qnbinom(.5, size = 10^5, mu = posterior_linpred(mod.spline, offset = rep(log(10^5), 36), transform = T) %>% apply(2, median) %>% head())
 1  2  3  4  5  6 
 9  2 11  3 13  4 
> posterior_linpred(mod.spline, offset = rep(log(10^5), 36), transform = T) %>% apply(2, mean) %>% head() %>% round # means
  1  2  3  4  5  6 
 10  3 12  4 14  4 
> posterior_predict(mod.spline, offset = rep(log(10^5), 36)) %>% apply(2, median) %>% head
 1  2  3  4  5  6 
 8  2 10  3 12  4 

The first two coincide, _linpred with mean gives higher results (as expected), while with _predict is consistently lower.

Hmm, one thing I notice is that in the last two lines it looks like you have mean for posterior_linpred but still median for posterior_predict. Is it closer if you use mean for both?

Also, here’s a quick check I did using a simple example model:

# stan_glm.nb is same as stan_glm with family=neg_binomial_2
fit <- stan_glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = MASS::quine) 

# posterior_epred is same as posterior_linpred(transform=T)
plot(colMeans(posterior_epred(fit)), colMeans(posterior_predict(fit)))
abline(a = 0, b = 1, col = "red")

which produces a plot that looks like I would expect:

image

What happens if you try that with your model?

Indeed using the mean in both _linpred and _predict gives the same results (after rounding)!
So why with the medians they do not, while the the first two methods (median linepred and qnbinom) agree?

Hi Angelo, sorry for the slow reply!

I think it’s easiest to see why the mean and median behave differently if we focus on a single observation. For example, for the model I posted above here’s a plot of the posterior distribution of \lambda_i vs the predictive distribution of y^{\rm rep}_i for i=1, i.e., the first column from posterior_epred() vs the first column from posterior_predict(). Because of the shapes of the distributions (the latter including the impact of the dispersion term), they have the same mean but different medians:

fit <- stan_glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = MASS::quine) 
lambda_1 <- posterior_epred(fit)[, 1]
yrep_1 <- posterior_predict(fit)[, 1]

make_hist <- function(x) {
  hist(x, main = deparse(substitute(x)))
  abline(v = mean(x), col = "red")
  abline(v = median(x), col = "blue")
  legend("right", 
         legend = c(paste("mean =", round(mean(x), 1)), 
                    paste("median =", round(median(x), 1))), 
         col = c("red", "blue"), 
         lty = 1)
}

make_hist(lambda_1)
make_hist(yrep_1)

image
image

In both cases the mean is about 18 while the medians differ.

Does that help explain the reason for what you’re seeing?

1 Like