Negative binomial model predictions unexpectedly large for values larger than seen in sample data

I have a functioning negative binomial regression model of count data with one integer predictor. My goal is prediction and I need to predict responses for predictor values quite larger than what is present in my sample data. With what I currently have, predicted responses get extremely large for new data values of the covariate at one order of magnitude above the available sample data.

I would like to ask what I could do in terms of priors to “stabilize” the model to prevent this.

Here is a working example with simulated data using brms:

# simulation of sample data following
Sigma <- matrix(.3, nrow=2, ncol=2) + diag(2)*.7
rawvars <- mvrnorm(n=1000, mu=c(0,0), Sigma=Sigma)
nbvars <- cbind(a=qnbinom(pnorm(rawvars[,1]), size=0.3, prob=0.1), b=qnbinom(pnorm(rawvars[,2]), size=0.3, prob=0.01))

nbmod <- brm(bf(a~b),
              family = negbinomial(),
              data=nbvars, chains=2, cores=2, warmup=1000, iter=3000)

posterior_predict(nbmod, newdata = data.frame(b=c(10, 100, 1000, 10000)), scale="response", nsamples=10)

Output looks like this:

[,1] [,2] [,3]         [,4]
 [1,]    0    4 1828 2.690863e+39
 [2,]    5    7  274 1.008591e+29
 [3,]    0    0 6908 2.524178e+25
 [4,]    1    9 1531 2.425535e+27
 [5,]    2    0  148 5.014781e+25
 [6,]    0    2  102 9.682165e+24
 [7,]    3    9   65 3.470250e+19
 [8,]    2    1   34 9.535929e+26
 [9,]    9    6 1547 2.965608e+25
[10,]    4    1   20 2.436934e+28
1 Like

the most likely cause is that the linear predictor works on the log scale, so the model as you have written it assumes, that increasing b (in data) by 1 means multiplying the expected value by the exponentiated coefficient of b (in the model). Since the coefficient is fitted as roughly 0.005, it means that the mean of the response is mean_at_zero * (exp(0.005) ^ new_b. Since exp(0.005) ^ 10000 = 5e+21 you may see how this gives raise to such large values.

In any case, extrapolation tasks such this are very difficult to do well - how do you justify that the relationships that hold for small values hold for larger ones? Do you have some theory that would tell you how should the relationship between b and the response look like? As a practical matter, one may try to use log(b) as a predictor instead, which would make the large values less crazy, but wihout any theory, this might be just as bad for extraploating as any other choice.

Does that make sense?

Best of luck with your model!

1 Like

Hi Martin,
thanks a lot for that insight, I was not aware that I had to take that into consideration. From testing your suggestion of using the logarith of the predictor quickly, it does work and I think I can justify it theoretically in my particular case. And indeed, the extrapolation issue remains, so I will take care to keep that in mind.
TYVM for the help!

1 Like