As part of a review of a journal article, I’m trying to puzzle through thinking about zero inflated mediators. Given work I’ve been doing with SEM, it seems like we should be able to think about them as latent variables, no? Where the observed variable is a single indicator with a path fixed to 1, no intercept, and we apply a zero inflated distribution.
Consider the following example. Let’s say sea surface temperature influences sea mink abundance which influences snail abundance (hungry hungry sea minks! sadly now exctinct).
BUT - our observations of sea minks is zero inflated.
library(dplyr)
library(brms)
set.seed(31415)
dat <- tibble(
sst = rnorm(100, 15, 4),
seaminks = rpois(100, exp(sst/10-2)),
seaminks_obs = ifelse(runif(100)>0.2, seaminks, 0),
snails = rpois(100, exp(-1*seaminks+3)),
true_seaminks = as.numeric(NA),
seaminks_nonzero = as.numeric(seaminks_obs>0)
)
The model I would conceive of is
sst → seaminks → snails, all relationships poisson
seaminks → 1*seaminks_obs, zero inflated poisson
Using Full Luxury Bayesian Structural Equation Modeling with brms as a guide for latent variables, I’d put together a model more or less like this:
###
# ZI mediation in brms with LV
###
seamink_mod <-
bf(true_seaminks | mi() ~ sst, family = poisson(link = "log")) +
bf(seaminks_obs ~ 0 + mi(true_seaminks), family = zero_inflated_poisson()) +
bf(snails ~ mi(true_seaminks), family = poisson(link = "log")) +
set_rescor(FALSE)
seamink_prior <-
prior(constant(1),
coef = "mitrue_seaminks",
resp = "seaminksobs",
class = "b") +
prior(gamma(11,11),
class = "sigma",
resp = "trueseaminks")
This doesn’t work, however, as we can’t have mi() working with a poisson distribution.
So instead
###
# ZI mediation in brms with LV
###
seamink_mod <-
bf(true_seaminks | mi() ~ sst, family = gaussian(link = "log")) +
bf(seaminks_obs ~ 0 + mi(true_seaminks), family = zero_inflated_poisson()) +
bf(snails ~ mi(true_seaminks), family = poisson(link = "log")) +
set_rescor(FALSE)
seamink_prior <-
prior(constant(1),
coef = "mitrue_seaminks",
resp = "seaminksobs",
class = "b") +
prior(gamma(11,11),
class = "sigma",
resp = "trueseaminks")
seamink_brm <- brm(seamink_mod,
prior = seamink_prior,
data = dat,
chains = 2)
This samples, although not super well.
I’m curious about two things
-
Does this seem like an interesting way forward, and how can we make it sample better (some of the coefficients here have posteriors that are still too narrow - but I think that’s due to poor sampling of the posterior).
-
Is this the right way to deal with an endogenous count latent variable, or is there a better approach?
Note - for completeness, there is another suggestion in the literature about zero inflated endogenous variables in SEM. I’ve implemented it below, and it works pretty well. One should be able to have seaminks_nonzero also predicted by seaminks_obs and sst, but, I kept getting wild coefficients there, which I suspect means we need good priors for that. The idea is the following model which just is currently missing
seaminks_nonzero (1,0) ~ sst + seaminks_obs which would be… bernouli?
###
# ZI mediation in brms with
# Jian et al approach
# https://pmc.ncbi.nlm.nih.gov/articles/PMC10686235/
###
seamink_jiang_mod <-
bf(seaminks_obs ~ sst, family = zero_inflated_poisson()) +
bf(snails ~ seaminks_nonzero + seaminks_obs, family = poisson(link = "log")) +
set_rescor(FALSE)
seamink_jiang_brm <- brm(seamink_jiang_mod,
data = dat,
chains = 2)
seamink_jiang_brm