Brms -- Zero inflated binomial -- trials -- Presence absence data

Charley_Gros · October 27, 2022, 9:06pm

Hi there,

Looking for some advice.

I have a dataset of presence-absence data: species survey, one visit only, is the species detected? yes / no.

Willing to use a zero_inflated_binomial distribution.

So something like:

m <- brm(presence ~ depth + slope_log + ice_sd + ssh_mean, data = dat, family = zero_inflated_binomial())

I get an error saying I need to use trials.
Given that we did only one visit to each site to assess the presence of the species… is it correct to use the following?

m <- brm(presence | trials(1) ~ depth + slope_log + ice_sd + ssh_mean, data = dat, family = zero_inflated_binomial())

matti · October 28, 2022, 10:48am

Hi @Charley_Gros,

Welcome to the Stan discourse :)

I haven’t actually used this but afaik this is how you do it. You can use pp_check(m) after drawing samples from the model to see whether it reproduces the observed data well (as it should.)

Solomon · October 28, 2022, 2:57pm

Yes, that syntax looks correct.

jd_c · October 28, 2022, 6:44pm

Someone can correct me if I am off here, but this seems like a very strange model. What you are proposing is a mixture of Bernoulli models. This is the relevant Stan function for the zero_inflated_binomial in brms:

real zero_inflated_binomial_logit_lpmf(int y, int trials,
                                         real theta, real zi) {
    if (y == 0) {
      return log_sum_exp(bernoulli_logit_lpmf(1 | zi),
                         bernoulli_logit_lpmf(0 | zi) +
                         binomial_lpmf(0 | trials, theta));
    } else {
      return bernoulli_logit_lpmf(0 | zi) +
             binomial_lpmf(y | trials, theta);
    }
  }

When trials is always 1, then the binomial_lpmf part could be replaced by bernoulli, and now you simply have a mixture of bernoullis, which seems weird to me…
Try a simple simulation:

y <- rbinom(1000,1,0.1)
d <- cbind.data.frame(y)
m1 <- brm(y | trials(1) ~ 1, data=d, family=zero_inflated_binomial, cores=4)
m2 <- brm(y | trials(1) ~ 1, data=d, family=binomial, cores=4)
m1
m2
s <- as_draws_df(m1)
s2 <- as_draws_df(m2)

#proportion of 1 in data
mean(d$y)

#proportion estimated from the zero_inflated_binomial
mean(plogis(s$b_Intercept) * (1-s$zi))

#proportion estimated from the binomial
mean(plogis(s2$b_Intercept))

Note that the sampling is worse for the zi model (low ESS) and the SE is large for both the Intercept and zi parameters. This is because it is a mixture of two Bernoulli’s. You can get almost the same mean estimate for the proportion of 1’s in the data, but the interpretation is weird, because the Intercept is now the log-odds of 1 for the data excluding the zero inflated data.

Perhaps if you explicitly modeled the zi component, then it would be ok… I’m really not sure…but it does strike me as a bit weird. I’d be surprised the standard errors wouldn’t always be high as which model is any given zero assigned to?

Topic		Replies	Views
Zero inflated beta binomial model under brms brms	1	1610	February 19, 2020
Help predicting from zero-inflated binomial brms techniques , specification	11	1718	December 6, 2021
Specifying bernoulli prior for zero-inflated beta response variable in brm() HGAM Modeling rstan , techniques , fitting-issues , brms	1	484	January 17, 2023
Zero-inflated negative binomial model with random-effects brms	4	4983	May 27, 2019
Zero-inflated poisson model for count data Interfaces brms	8	1145	August 6, 2023

Brms -- Zero inflated binomial -- trials -- Presence absence data

Related topics