Fitting count data with negative binomial - long tail

There are at least two things you IMHO should check:

  1. Is the problem in the residual variability after accounting for predictors (i.e. the outcome distribution is wrong) or is it specific to some subgroups of the data?

To check this, you’d want to do grouped PP checks. It might also be useful to focus on the stat_grouped PPC with variance or mean/variance (a.k.a. Fano factor) as statistics. If you see that for some soubgroups of the data (e.g. for sam values of BACKPROX_Va or low avg_Thgt) the model has too much variance/too heavy tails and for others it has too little variance/too light tails, it might be wortwhile to put those as predictors on the dispersion parameter (i.e. let the dispersion vary between observations).

If instead the problem is more or less the same in all subgroups that make sense for your data, it might indicate the problem is in the outcome distribution and you may need something that’s even more flexible than negative binomial. Poisson - LogNormal models are an example of an alternative that has some support in existing software (though would require a bit of hacking to get it running in brms). There are also even more exotic Poisson-XXX variants, but those tend to have poor software support.

Adding a random effect per observation (i.e. something like (1 | Event_ID)) could also help in this case.

  1. Make sure your offsets are OK. There are two meaningful ways to use offsets in a negative binomial model - either the offset acts on mean alone or it also scales the dispersion. See Scaling of the overdispersion in negative binomial models for discussion. If I recall correctly, using Total | rate(Dist.Vol) instead of an offset term should automatically scale the dispersion, which I’d guess might be more appropriate for your use case, but please double check.

Best of luck with your model!

2 Likes