Add zero-inflation to custom beta-binomial model?

lakemonster · April 2, 2020, 10:32pm

Hi Everyone,

I am modeling stomata density from the upper surface of plant leaves collected from clones grown in a common garden. Not all plants have stomata on the upper leaf surface, but some do and the leaves with stomata on the upper surface vary in their density.

I need to create a beta-binomial distribution model that includes zero-inflation of the data. I was able to follow the vignette describing how to create custom distribution families (thanks for that Paul ;) and I have fit a beta-binomial model. The output is nearly what I need with two exceptions (excuse my not-so-statistical descriptions here)

The beta distribution is spreading out the zero’s of the random effects and ignoring the fact that those observations are real zeros.
The y-intercept from the beta-binomial is negative, and I was hoping to add the intercept to the random effects to shift them to measurement scale.

My questions are,

How can I include the zero-inflation into the model so that the zeros remain as such?
Is a hurdle model better suited to these data?
Why is the intercept negative for this model?

I’ve never encountered a negative intercept for count data. I compared a hurdle model to what I made, but I think after I include a zero-inflation bit into the model, it will improve.

Thanks all for the help.

I’m running these analyses with this setup:

Operating System: Pop!_OS 19.10
brms Version: brms_2.12.0

Here’s a reproducible example:

library(dplyr)
library(brms)
library(ggplot2)
library(cowplot)

Simulate the data to have 75 zero’s and 25 random numbers drawn from a normal distribution

df <- data.frame(
  ind_code = sort(paste0(rep(LETTERS[1:10],10), sep="_", rep(1:10, each=10))),
  stomata_count = round(c(rep(0, 75), rnorm(25, mean = 10, sd = 1.5))),
  row = round(runif(100, min=0, max=25)),
  col = round(runif(100, min=0, max=40)),
  size = 100)

Next, build the custom family components outlined in the custom dist. vignette.

# Define a custom distribution family
beta_binomial2 <- custom_family(
  "beta_binomial2", dpars = c("mu", "phi"),
  links = c("logit", "log"), lb = c(NA, 0),
  type = "int", vars = "vint1[n]"
)

# Provide stan functions if the distribution is not defined there
# Beta2 is defined as beta_binomial in stan.
stan_funs <- "
  real beta_binomial2_lpmf(int y, real mu, real phi, int T) {
    return beta_binomial_lpmf(y | T, mu * phi, (1 - mu) * phi);
    }
    int beta_binomial2_rng(real mu, real phi, int T) {
    return beta_binomial_rng(T, mu * phi, (1 - mu) * phi);
    }
"
# provide stan with the above code as a variable
stanvars <- stanvar(scode = stan_funs, block = "functions")

Now I can fit a beta-binomial distribution to the data.

fit1 <- brm(stomata_count | vint(size) ~ row + col + (1|ind_code),
            data = df,
            family = beta_binomial2,
            chains = 1L, seed = 2398,
            stanvars = stanvars)
summary(fit1)
Family: beta_binomial2 
 Links: mu = logit; phi = identity
Formula: stomata_count | vint(size) ~ row + col + (1 | ind_code)
Data: df (Number of observations: 100) 
Samples: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
     total post-warmup samples = 1000

Group-Level Effects: 
~ind_code (Number of levels: 100) 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     2.29      0.69     1.12     3.80 1.01       68       41

Population-Level Effects: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -5.68      1.08    -8.12    -3.83 1.00      220      433
row           0.02      0.05    -0.06     0.13 1.01      153      329
col          -0.00      0.03    -0.07     0.06 1.01      167       74

Family Specific Parameters: 
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi    10.80      6.29     3.65    29.99 1.00       92       73

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Pull out the BLUPs and also add the intercept of the model to shift the blups to measurement scale. (This usually shifts the BLUPs to measurement scale, but here, the intercept is negative - so it doens’t do what I expect - Anyone have an idea why the intercept is negative?)

# Pull out the y-intercept
y_intercept <- posterior_summary(fit1)[row.names(posterior_summary(fit1))=="b_Intercept","Estimate"]

# Make the blups data.frame
blups.stomata <- data.frame(ind_code = rownames(ranef(fit1)$ind_code), 
                      blups.stomata_count= ranef(fit1)$ind_code[,1,1], 
                      clonal_val.stomata_count = ranef(fit1)$ind_code[,1,1] + y_intercept)

Finally I can plot the data to evaluate the output.

p0 <- ggplot(blups.stomata, aes(x=blups.stomata_count)) + geom_histogram(bins=50) + ggtitle("BLUPs")
p1 <- ggplot(blups.stomata, aes(x=clonal_val.stomata_count)) + geom_histogram(bins=50) + ggtitle("Measurement scale BLUPs")
plot_grid(p0, p1, align='hv')

Rplot

torkar · April 3, 2020, 6:27am

Could it be that you use a logit link? In short,

> inv_logit_scaled(-7)
[1] 0.0009110512

lakemonster · May 2, 2020, 2:53pm

yep, that was why.

Ross_Kempner · June 11, 2021, 8:33pm

Hi @lakemonster , are you sure that ranef gets the BLUPs?

Topic		Replies	Views
Zero inflated beta binomial model under brms brms	1	1617	February 19, 2020
Non linear distribution with some zeros in data Modeling	39	2977	December 17, 2021
Mathematical Notation for a zero inflated negative binomial model in brms Modeling math	13	2000	March 26, 2021
Question About Implementation of Log Likelihood for a Zero-Inflated Beta Binomial Model in R Modeling techniques , specification , loo , brms	1	713	December 16, 2021
Calibration of beta inflated in brms brms	1	172	April 8, 2024

Add zero-inflation to custom beta-binomial model?

Related topics