Interpretation of coefficients in hurdle negative binomial models

afukunaga · December 12, 2023, 1:10am

I am trying to model fish counts at different sites using a hurdle negative binomial models using the brms package.
When I fit my model with “bf(fish_count ~ site, hu ~ site)” and “family = hurdle_negbinomial()” and compare the coefficients with the conditional effects plots, they don’t seem to match.
These is no problem with the hurdle part (logit). I can use plogis() to get the probability of 0 for each site and match them with the conditional effects plot with dpar = “hu”.
However, this is not the case with the abundance (negative binomial) part. The values I get using exp() do not match with those from the conditional effects plot with dpar = “mu”.

I originally had a more complex models (with a temporal component and some random effects terms), but decided to try this simple model first to make sense of the coefficients, but I was not successful… I also noticed that if I run a negative binomial model after removing rows where fish_count is 0, the model gives me results that are different from the hurdle negative binomial model even if I use the intercept-only hurdle (hu ~ 1).

Additionally I tried a frequentist approach using the pscl package, and both pscl and brms hurdle models gave me very similar coefficients except for the Intercept.

I feel like I am making a critical (and maybe silly) mistake in interpreting the coefficients in the hurdle model… Could someone help?

jsocolar · December 12, 2023, 2:01am

Hi and welcome to the Stan discourse! I think you will probably get better answers to your question by being more explicit about what you have done and what is confusing:

What exactly are you exponentiating?

In what way are the results different (what exactly did you do to conclude that they’re different?)

afukunaga · December 12, 2023, 6:25pm

Thank you so much for your reply.
Here is the output of my hurdle model.
s_rubroviolaceus is the fish count and I have 11 sites.

Family: hurdle_negbinomial 
Links: mu = log; shape = identity; hu = logit 
Formula: s_rubroviolaceus ~ site_name 
         hu ~ site_name
   Data: fahu_count_df (Number of observations: 3443) 
  Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup draws = 8000

Population-Level Effects: 
                                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                             -2.96      2.37    -8.78    -0.74 1.00     1463     1297
hu_Intercept                           0.76      0.16     0.44     1.08 1.00     1111     1706
site_nameAlaeloa                       1.02      0.36     0.31     1.74 1.00     1730     2964
site_nameHonoluaMMokuleiaClosed        0.97      0.36     0.25     1.68 1.00     1794     3017
site_nameHonoluaMMokuleiaOpen          0.49      0.47    -0.42     1.45 1.00     2108     3864
site_nameKihei                         1.50      0.35     0.82     2.21 1.00     1711     2757
site_nameLaPerouse                     1.62      0.58     0.59     2.88 1.00     2645     3792
site_nameMakena                        0.76      0.36     0.05     1.46 1.00     1676     3271
site_nameMolokini                     -0.04      0.30    -0.65     0.54 1.00     1406     2282
site_nameOlowalu                       0.69      0.29     0.10     1.25 1.00     1322     2485
site_namePaia                          1.15      0.33     0.50     1.79 1.00     1594     2641
site_nameWaihee                        1.10      0.32     0.46     1.73 1.00     1508     2555
hu_site_nameAlaeloa                    1.07      0.22     0.64     1.49 1.00     1564     2617
hu_site_nameHonoluaMMokuleiaClosed    -1.16      0.26    -1.66    -0.66 1.00     2062     3580
hu_site_nameHonoluaMMokuleiaOpen       0.26      0.28    -0.29     0.80 1.00     2139     3543
hu_site_nameKihei                      0.37      0.21    -0.04     0.78 1.00     1556     2931
hu_site_nameLaPerouse                  1.28      0.30     0.69     1.88 1.00     2422     4262
hu_site_nameMakena                     1.01      0.21     0.60     1.43 1.00     1518     2729
hu_site_nameMolokini                  -1.75      0.21    -2.18    -1.34 1.00     1555     3041
hu_site_nameOlowalu                   -0.51      0.18    -0.87    -0.16 1.00     1322     2242
hu_site_namePaia                       0.41      0.20     0.02     0.79 1.00     1417     2509
hu_site_nameWaihee                     0.22      0.20    -0.17     0.60 1.00     1450     3003

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.07      0.06     0.00     0.22 1.00     1630     1282

When I exponentiate, for example, the Intercept (exp(-2.96)) to get the fish count at the reference site level (Ahihi), I get 0.05. Or For Kihei, exp(-2.96+1.50) is 0.23. In the conditional effects plot with dpar = “mu”, they are about 0.1 and 0.5 respectively.

If I subset the data frame with “s_rubroviolaceus > 0”, then run a negative binomial model. I get the following output.

Family: negbinomial 
Links: mu = log; shape = identity 
Formula: s_rubroviolaceus ~ site_name 
   Data: fahu_count_df_abd (Number of observations: 1031) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
                                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                           0.69      0.13     0.44     0.96 1.01      613     1004
site_nameAlaeloa                    0.52      0.17     0.19     0.86 1.01      861     1635
site_nameHonoluaMMokuleiaClosed     0.49      0.17     0.14     0.82 1.01      870     1464
site_nameHonoluaMMokuleiaOpen       0.21      0.22    -0.27     0.62 1.00     1289     2245
site_nameKihei                      0.82      0.16     0.50     1.13 1.00      830     1397
site_nameLaPerouse                  0.84      0.24     0.37     1.31 1.00     1596     2402
site_nameMakena                     0.37      0.17     0.02     0.70 1.00      910     1569
site_nameMolokini                  -0.02      0.15    -0.31     0.27 1.01      748     1349
site_nameOlowalu                    0.34      0.14     0.05     0.62 1.01      692     1368
site_namePaia                       0.61      0.15     0.29     0.90 1.01      753     1428
site_nameWaihee                     0.57      0.15     0.26     0.88 1.01      763     1454

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     2.27      0.16     1.98     2.59 1.00     3600     2938

My understanding is that a hurdle models is a 2-step approach, with the first step using the whole dataset to model presence/absence, then the second step is to model the abundance using the presence-only data. In an example I found online, the coefficients for the 2nd step were almost the same as the one that was obtained by running a non-hurdle model after removing 0s (although that example was using a hurdle lognormal model…)
Also, with this negative binomial model, if I exponentiate the Intercept (exp(0.69)), I get 1.99, which is what I see in the conditional effects plot for the reference site level (Ahihi).
It seems that the coefficients for the sites in these two models are off by a factor of about 2…

afukunaga · December 15, 2023, 12:03am

I’m also adding the output for the same model from the pscl package below.

Coefficients are very similar between the two, except for the Intercept (-2.96 in brms, -1.0999 in pscl) and shape (0.07 in brms, exp(-1.8) = 0.165 in pscl) for the negative binomial part.
These two models also generate very similar fitted values for each factor level.

For the pscl model, I can manually calculate the fitted values by doing “probability of non-zero from logit” * “mean count from NB” / “probability of non-zero from NB” in R. For example, for the reference factor level,

plogis(-0.7559) * exp(-1.09990) / (1 - ((exp(-1.8)/(exp(-1.8) + exp(-1.0999))) ^ exp(-1.8)))

This gives me 0.638, which is the fitted value for the reference factor level for the pscl model. The brms model also has the fitted value of 0.655 for the reference factor level, but doing the same calculation using the brms Intercept and shape parameter results in 0.434.

My full model is a hierarchical model with a random factor, and the pscl package doesn’t seem to handle it, so I need to run it in brms.
If anyone has any insight, I’d really appreciate it. I just want to understand the brms output correctly.

Call:
hurdle(formula = s_rubroviolaceus ~ site_name | site_name, data = fahu_count_df, dist = "negbin")

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-0.8825 -0.3768 -0.2788 -0.1022 19.5664 

Count model coefficients (truncated negbin with log link):
                                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)                      -1.09990    0.42965  -2.560 0.010467 *  
site_nameAlaeloa                  0.97677    0.33455   2.920 0.003504 ** 
site_nameHonolua-Mokuleia Closed  0.92572    0.33694   2.747 0.006006 ** 
site_nameHonolua-Mokuleia Open    0.42687    0.42978   0.993 0.320590    
site_nameKihei                    1.44425    0.32059   4.505 6.64e-06 ***
site_nameLaPerouse                1.46594    0.48712   3.009 0.002618 ** 
site_nameMakena                   0.72090    0.33246   2.168 0.030128 *  
site_nameMolokini                -0.03364    0.28157  -0.119 0.904887    
site_nameOlowalu                  0.67433    0.27488   2.453 0.014159 *  
site_namePaia                     1.11400    0.30170   3.692 0.000222 ***
site_nameWaihee                   1.06486    0.29997   3.550 0.000385 ***
Log(theta)                       -1.80412    0.43524  -4.145 3.40e-05 ***
Zero hurdle model coefficients (binomial with logit link):
                                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)                       -0.7559     0.1650  -4.582 4.60e-06 ***
site_nameAlaeloa                  -1.0619     0.2137  -4.969 6.72e-07 ***
site_nameHonolua-Mokuleia Closed   1.1532     0.2608   4.423 9.75e-06 ***
site_nameHonolua-Mokuleia Open    -0.2486     0.2824  -0.880  0.37865    
site_nameKihei                    -0.3718     0.2097  -1.773  0.07624 .  
site_nameLaPerouse                -1.2590     0.3003  -4.192 2.76e-05 ***
site_nameMakena                   -1.0125     0.2127  -4.761 1.93e-06 ***
site_nameMolokini                  1.7418     0.2134   8.163 3.25e-16 ***
site_nameOlowalu                   0.5130     0.1861   2.756  0.00585 ** 
site_namePaia                     -0.4129     0.1978  -2.087  0.03687 *  
site_nameWaihee                   -0.2183     0.1982  -1.101  0.27085    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Theta: count = 0.1646
Number of iterations in BFGS optimization: 36
Log-likelihood: -3748 on 23 Df

Topic		Replies	Views
Interpret hurdle coefficients in brms Modeling interpret-results , brms	1	1285	August 11, 2021
Hierachical Hurdle Negative Binomial brms specification	3	679	February 4, 2019
Hurdle model with BYM2 correlation structure on hurdle component brms	7	715	March 5, 2020
The conditional_effects function of the brms package with a hurdle poisson model Interfaces brms	6	809	January 17, 2023
Priors for hurdle models in brms Modeling priors , brms	2	741	August 21, 2023

Interpretation of coefficients in hurdle negative binomial models

Related topics