PSIS LOO inappropriate for negative binomial regression?

The General Question
My question arises from this passage in McElereth’s Statistical Rethinking Second Edition:

You should not use WAIC or PSIS with [negative binomial/gamma-Poisson or beta binomial] models, however, unless you are very sure of what you are doing. The reason is that while ordinary binomial and Poisson models can be aggregated and disaggregated across rows in the data, without changing any causal assumptions, the same is not true of beta-binomial and gamma-Poisson models. The reason is that a beta-binomial or gamma-Poisson likelihood applies an unobserved parameter to each row in the data. When we then go to calculate log-likelihoods, how the data are structured will determine how the beta-distributed or gamma-distributed variation enters the model (p. 375).

I can’t say I’m "very sure " of what I am doing and came across the passage attempting to build my understanding.

Based on the advice of Gelman, Hill, and Vehtari (2020), I was using loo() to compare models (it uses Pareto smoothed importance-sampling leave-one-out cross-validation (PSIS-LOO)) . I wouldn’t have thought twice about using loo on these models because, for one thing, the package vignette Using the loo package (version >= 2.0.0) uses a negative binomial model as an example.

McElereth follows up with a sentence or two on “What do Do?” instead and responds that multi-level models “can assign heterogeneity in probabilities or rates at any level of aggregation.” I’m looking into a multilevel approach, but at this point, I’m curious about how to interpret the loo results with the current models. I’m somewhat familiar with the multilevel modeling approach and am not sure it will work with this data.

Background on the Data and Modelling

I’m posting this in hopes some would have insight on what to look out for and to better understand the concerns he’s expressing. I understand that his concerns relate to how the data are aggregated but I’m not understanding them deeply enough to know how they effect what we’re working on.

For context, the data we’re using is from a pilot survey aimed at monitoring the abundance of specific fish species. A row records the results from surveying a particular survey station at a particular time and duration. The outcome/response variable is number of fish caught (modelling each species separately). The available predictor variables include four large/coarse scale areas (“Area”), two depth categories “Depth_Category”, and year (“Year”). We have four years of data so far.

We also include the number of angler minutes (i.e. number of people fishing times the time spent fishing). The conventional approach is to use it as an offset so that we’re modelling the rate (i.e. catch per unit effort). Part of the motivation here is to look at the effect of that versue the alternative of letting the coefficient on angler effort vary from one. And the loo results suggest that letting it vary does fit the data better and the coefficient goes negative (which is plausible).

Again, the purpose of exploring models would be to produce a relative index of abundance from the survey.

Models and Loo Results

These are models I’m interested in comparing.

m_offset <- brm(Count ~ Year + Area + Depth_Category + offset(log(Minutes)), 
                        data = survey.df,
                        family = negbinomial())

m_no_offset <- brm(Count ~ Year + Area + Depth_Category + Minutes, 
                        data = survey.df,
                        family = negbinomial())

m_hurdle <- brm(Count ~ Year + Area + Depth_Category + Minutes, 
                        data = survey.df,
                        family = hurdle_negbinomial())

m_zero_inflate <- brm(Count ~ Year + Area + Depth_Category + Minutes, 
                        data = survey.df,
                        family = zero_inflated_negbinomial())

We also ran models using fewer predictor variables and the model using all the variables performs best. However, none fit well. Running a generalized liner model on the m_offset model, for example, produced a Deviance Explained of ~9%. The version that lets the coefficient on Minutes vary increases to ~18%. I explored the zero-inflated and hurdle models because the Posterior Predictive Checks with pp_check() show that the models are predicting more zeros than seen in the data.

Running loo() on each and then loo_compare() produces these results.

elpd_diff se_diff
loo_zero_inflate 0.0 0.0
loo_no_offset -6.1 4.1
loo_hurdle -8.0 2.6
loo_offset -34.6 7.4

Again, while new to interpreting this. my understanding is that it does suggest that the models that let the coefficient on effort vary do notably better than the offset model. And while the zero inflated version does the best, the se_diff of 4.1 means that it wouldn’t be expected to have much better predictive power than the standard negative binomial version.

Seeing these results, I was going to recommend taking a closer look at the offset approach. There are reasons why catch may tail off as fishing effort goes up in this survey (at least over the range in the data). However, seeing McElereth’s warning made me second guess doing so.

Again, any advice and insights would be appreciated! Thank you.

1 Like

I agree on the comment about the different behavior in aggregation, but I don’t agree on the recommendation. It seems a more accurate recommendation would to think carefully what is the left-out-unit (that’s one reason I favor thinking in terms of cross-validation as *IC obfuscate it more) and avoid comparing models with different aggregations unless you know what you are doing. I’m tagging @richard_mcelreath, so he can clarify himself (it is possible that I’m forgetting something).

I agree on this statement, but then modeling overdispersion by having additional parameter for each observation is challenging for WAIC and PSIS-LOO, so you should use WAIC or PSIS with such models only if you know what you are doing.

I think comparison of these is safe.

Yes

Yes, You can also check if it performs better in PPC when looking at the proportion of zeros.

1 Like