I am trying to model fish counts at different sites using a hurdle negative binomial models using the brms package.
When I fit my model with “bf(fish_count ~ site, hu ~ site)” and “family = hurdle_negbinomial()” and compare the coefficients with the conditional effects plots, they don’t seem to match.
These is no problem with the hurdle part (logit). I can use plogis() to get the probability of 0 for each site and match them with the conditional effects plot with dpar = “hu”.
However, this is not the case with the abundance (negative binomial) part. The values I get using exp() do not match with those from the conditional effects plot with dpar = “mu”.
I originally had a more complex models (with a temporal component and some random effects terms), but decided to try this simple model first to make sense of the coefficients, but I was not successful… I also noticed that if I run a negative binomial model after removing rows where fish_count is 0, the model gives me results that are different from the hurdle negative binomial model even if I use the intercept-only hurdle (hu ~ 1).
Additionally I tried a frequentist approach using the pscl package, and both pscl and brms hurdle models gave me very similar coefficients except for the Intercept.
I feel like I am making a critical (and maybe silly) mistake in interpreting the coefficients in the hurdle model… Could someone help?
Hi and welcome to the Stan discourse! I think you will probably get better answers to your question by being more explicit about what you have done and what is confusing:
What exactly are you exponentiating?
In what way are the results different (what exactly did you do to conclude that they’re different?)
Thank you so much for your reply.
Here is the output of my hurdle model.
s_rubroviolaceus is the fish count and I have 11 sites.
Family: hurdle_negbinomial
Links: mu = log; shape = identity; hu = logit
Formula: s_rubroviolaceus ~ site_name
hu ~ site_name
Data: fahu_count_df (Number of observations: 3443)
Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
total post-warmup draws = 8000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -2.96 2.37 -8.78 -0.74 1.00 1463 1297
hu_Intercept 0.76 0.16 0.44 1.08 1.00 1111 1706
site_nameAlaeloa 1.02 0.36 0.31 1.74 1.00 1730 2964
site_nameHonoluaMMokuleiaClosed 0.97 0.36 0.25 1.68 1.00 1794 3017
site_nameHonoluaMMokuleiaOpen 0.49 0.47 -0.42 1.45 1.00 2108 3864
site_nameKihei 1.50 0.35 0.82 2.21 1.00 1711 2757
site_nameLaPerouse 1.62 0.58 0.59 2.88 1.00 2645 3792
site_nameMakena 0.76 0.36 0.05 1.46 1.00 1676 3271
site_nameMolokini -0.04 0.30 -0.65 0.54 1.00 1406 2282
site_nameOlowalu 0.69 0.29 0.10 1.25 1.00 1322 2485
site_namePaia 1.15 0.33 0.50 1.79 1.00 1594 2641
site_nameWaihee 1.10 0.32 0.46 1.73 1.00 1508 2555
hu_site_nameAlaeloa 1.07 0.22 0.64 1.49 1.00 1564 2617
hu_site_nameHonoluaMMokuleiaClosed -1.16 0.26 -1.66 -0.66 1.00 2062 3580
hu_site_nameHonoluaMMokuleiaOpen 0.26 0.28 -0.29 0.80 1.00 2139 3543
hu_site_nameKihei 0.37 0.21 -0.04 0.78 1.00 1556 2931
hu_site_nameLaPerouse 1.28 0.30 0.69 1.88 1.00 2422 4262
hu_site_nameMakena 1.01 0.21 0.60 1.43 1.00 1518 2729
hu_site_nameMolokini -1.75 0.21 -2.18 -1.34 1.00 1555 3041
hu_site_nameOlowalu -0.51 0.18 -0.87 -0.16 1.00 1322 2242
hu_site_namePaia 0.41 0.20 0.02 0.79 1.00 1417 2509
hu_site_nameWaihee 0.22 0.20 -0.17 0.60 1.00 1450 3003
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 0.07 0.06 0.00 0.22 1.00 1630 1282
When I exponentiate, for example, the Intercept (exp(-2.96)) to get the fish count at the reference site level (Ahihi), I get 0.05. Or For Kihei, exp(-2.96+1.50) is 0.23. In the conditional effects plot with dpar = “mu”, they are about 0.1 and 0.5 respectively.
If I subset the data frame with “s_rubroviolaceus > 0”, then run a negative binomial model. I get the following output.
Family: negbinomial
Links: mu = log; shape = identity
Formula: s_rubroviolaceus ~ site_name
Data: fahu_count_df_abd (Number of observations: 1031)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.69 0.13 0.44 0.96 1.01 613 1004
site_nameAlaeloa 0.52 0.17 0.19 0.86 1.01 861 1635
site_nameHonoluaMMokuleiaClosed 0.49 0.17 0.14 0.82 1.01 870 1464
site_nameHonoluaMMokuleiaOpen 0.21 0.22 -0.27 0.62 1.00 1289 2245
site_nameKihei 0.82 0.16 0.50 1.13 1.00 830 1397
site_nameLaPerouse 0.84 0.24 0.37 1.31 1.00 1596 2402
site_nameMakena 0.37 0.17 0.02 0.70 1.00 910 1569
site_nameMolokini -0.02 0.15 -0.31 0.27 1.01 748 1349
site_nameOlowalu 0.34 0.14 0.05 0.62 1.01 692 1368
site_namePaia 0.61 0.15 0.29 0.90 1.01 753 1428
site_nameWaihee 0.57 0.15 0.26 0.88 1.01 763 1454
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape 2.27 0.16 1.98 2.59 1.00 3600 2938
My understanding is that a hurdle models is a 2-step approach, with the first step using the whole dataset to model presence/absence, then the second step is to model the abundance using the presence-only data. In an example I found online, the coefficients for the 2nd step were almost the same as the one that was obtained by running a non-hurdle model after removing 0s (although that example was using a hurdle lognormal model…)
Also, with this negative binomial model, if I exponentiate the Intercept (exp(0.69)), I get 1.99, which is what I see in the conditional effects plot for the reference site level (Ahihi).
It seems that the coefficients for the sites in these two models are off by a factor of about 2…
I’m also adding the output for the same model from the pscl package below.
Coefficients are very similar between the two, except for the Intercept (-2.96 in brms, -1.0999 in pscl) and shape (0.07 in brms, exp(-1.8) = 0.165 in pscl) for the negative binomial part.
These two models also generate very similar fitted values for each factor level.
For the pscl model, I can manually calculate the fitted values by doing “probability of non-zero from logit” * “mean count from NB” / “probability of non-zero from NB” in R. For example, for the reference factor level,
plogis(-0.7559) * exp(-1.09990) / (1 - ((exp(-1.8)/(exp(-1.8) + exp(-1.0999))) ^ exp(-1.8)))
This gives me 0.638, which is the fitted value for the reference factor level for the pscl model. The brms model also has the fitted value of 0.655 for the reference factor level, but doing the same calculation using the brms Intercept and shape parameter results in 0.434.
My full model is a hierarchical model with a random factor, and the pscl package doesn’t seem to handle it, so I need to run it in brms.
If anyone has any insight, I’d really appreciate it. I just want to understand the brms output correctly.
Call:
hurdle(formula = s_rubroviolaceus ~ site_name | site_name, data = fahu_count_df, dist = "negbin")
Pearson residuals:
Min 1Q Median 3Q Max
-0.8825 -0.3768 -0.2788 -0.1022 19.5664
Count model coefficients (truncated negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.09990 0.42965 -2.560 0.010467 *
site_nameAlaeloa 0.97677 0.33455 2.920 0.003504 **
site_nameHonolua-Mokuleia Closed 0.92572 0.33694 2.747 0.006006 **
site_nameHonolua-Mokuleia Open 0.42687 0.42978 0.993 0.320590
site_nameKihei 1.44425 0.32059 4.505 6.64e-06 ***
site_nameLaPerouse 1.46594 0.48712 3.009 0.002618 **
site_nameMakena 0.72090 0.33246 2.168 0.030128 *
site_nameMolokini -0.03364 0.28157 -0.119 0.904887
site_nameOlowalu 0.67433 0.27488 2.453 0.014159 *
site_namePaia 1.11400 0.30170 3.692 0.000222 ***
site_nameWaihee 1.06486 0.29997 3.550 0.000385 ***
Log(theta) -1.80412 0.43524 -4.145 3.40e-05 ***
Zero hurdle model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.7559 0.1650 -4.582 4.60e-06 ***
site_nameAlaeloa -1.0619 0.2137 -4.969 6.72e-07 ***
site_nameHonolua-Mokuleia Closed 1.1532 0.2608 4.423 9.75e-06 ***
site_nameHonolua-Mokuleia Open -0.2486 0.2824 -0.880 0.37865
site_nameKihei -0.3718 0.2097 -1.773 0.07624 .
site_nameLaPerouse -1.2590 0.3003 -4.192 2.76e-05 ***
site_nameMakena -1.0125 0.2127 -4.761 1.93e-06 ***
site_nameMolokini 1.7418 0.2134 8.163 3.25e-16 ***
site_nameOlowalu 0.5130 0.1861 2.756 0.00585 **
site_namePaia -0.4129 0.1978 -2.087 0.03687 *
site_nameWaihee -0.2183 0.1982 -1.101 0.27085
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta: count = 0.1646
Number of iterations in BFGS optimization: 36
Log-likelihood: -3748 on 23 Df