Hi Everyone,
I am modeling stomata density from the upper surface of plant leaves collected from clones grown in a common garden. Not all plants have stomata on the upper leaf surface, but some do and the leaves with stomata on the upper surface vary in their density.
I need to create a beta-binomial distribution model that includes zero-inflation of the data. I was able to follow the vignette describing how to create custom distribution families (thanks for that Paul ;) and I have fit a beta-binomial model. The output is nearly what I need with two exceptions (excuse my not-so-statistical descriptions here)
- The beta distribution is spreading out the zero’s of the random effects and ignoring the fact that those observations are real zeros.
- The y-intercept from the beta-binomial is negative, and I was hoping to add the intercept to the random effects to shift them to measurement scale.
My questions are,
- How can I include the zero-inflation into the model so that the zeros remain as such?
- Is a hurdle model better suited to these data?
- Why is the intercept negative for this model?
I’ve never encountered a negative intercept for count data. I compared a hurdle model to what I made, but I think after I include a zero-inflation bit into the model, it will improve.
Thanks all for the help.
I’m running these analyses with this setup:
- Operating System: Pop!_OS 19.10
- brms Version: brms_2.12.0
Here’s a reproducible example:
library(dplyr)
library(brms)
library(ggplot2)
library(cowplot)
Simulate the data to have 75 zero’s and 25 random numbers drawn from a normal distribution
df <- data.frame(
ind_code = sort(paste0(rep(LETTERS[1:10],10), sep="_", rep(1:10, each=10))),
stomata_count = round(c(rep(0, 75), rnorm(25, mean = 10, sd = 1.5))),
row = round(runif(100, min=0, max=25)),
col = round(runif(100, min=0, max=40)),
size = 100)
Next, build the custom family components outlined in the custom dist. vignette.
# Define a custom distribution family
beta_binomial2 <- custom_family(
"beta_binomial2", dpars = c("mu", "phi"),
links = c("logit", "log"), lb = c(NA, 0),
type = "int", vars = "vint1[n]"
)
# Provide stan functions if the distribution is not defined there
# Beta2 is defined as beta_binomial in stan.
stan_funs <- "
real beta_binomial2_lpmf(int y, real mu, real phi, int T) {
return beta_binomial_lpmf(y | T, mu * phi, (1 - mu) * phi);
}
int beta_binomial2_rng(real mu, real phi, int T) {
return beta_binomial_rng(T, mu * phi, (1 - mu) * phi);
}
"
# provide stan with the above code as a variable
stanvars <- stanvar(scode = stan_funs, block = "functions")
Now I can fit a beta-binomial distribution to the data.
fit1 <- brm(stomata_count | vint(size) ~ row + col + (1|ind_code),
data = df,
family = beta_binomial2,
chains = 1L, seed = 2398,
stanvars = stanvars)
summary(fit1)
Family: beta_binomial2
Links: mu = logit; phi = identity
Formula: stomata_count | vint(size) ~ row + col + (1 | ind_code)
Data: df (Number of observations: 100)
Samples: 1 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 1000
Group-Level Effects:
~ind_code (Number of levels: 100)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 2.29 0.69 1.12 3.80 1.01 68 41
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -5.68 1.08 -8.12 -3.83 1.00 220 433
row 0.02 0.05 -0.06 0.13 1.01 153 329
col -0.00 0.03 -0.07 0.06 1.01 167 74
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
phi 10.80 6.29 3.65 29.99 1.00 92 73
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Pull out the BLUPs and also add the intercept of the model to shift the blups to measurement scale. (This usually shifts the BLUPs to measurement scale, but here, the intercept is negative - so it doens’t do what I expect - Anyone have an idea why the intercept is negative?)
# Pull out the y-intercept
y_intercept <- posterior_summary(fit1)[row.names(posterior_summary(fit1))=="b_Intercept","Estimate"]
# Make the blups data.frame
blups.stomata <- data.frame(ind_code = rownames(ranef(fit1)$ind_code),
blups.stomata_count= ranef(fit1)$ind_code[,1,1],
clonal_val.stomata_count = ranef(fit1)$ind_code[,1,1] + y_intercept)
Finally I can plot the data to evaluate the output.
p0 <- ggplot(blups.stomata, aes(x=blups.stomata_count)) + geom_histogram(bins=50) + ggtitle("BLUPs")
p1 <- ggplot(blups.stomata, aes(x=clonal_val.stomata_count)) + geom_histogram(bins=50) + ggtitle("Measurement scale BLUPs")
plot_grid(p0, p1, align='hv')