Model that includes zero-inflated explanatory (not response) variable

Modeling zero-inflated response data is pretty straight-forward, especially when using families from brms. But I wonder, is there a means of incorporating information about a zero-inflated explanatory variable into such a model (or any kind of regression)? Or is it more common to split up the response data and model the zero and non-zero information separately?

For example, I have some explanatory ratio data (m/m^2) that is concentrated at zero, but there is huge variation in the response variables at that point (response is log-normally distributed, but also has zeros, so I’m using a hurdle model), creating a kind of “spike” in uncertainty in the response when the explanatory variable is zero. Any way to handle this kind of thing?

To answer my own question in a way that doesn’t include a measurement error model, one can include a dummy variable the encodes the presence/absence of zeros in the explanatory data.

For example:

library(brms)

set.seed(123)
my_data <- data.frame(
  response = rnorm(100),
  explanatory = c(rep(0, 50), rnorm(50, mean = 5))) |>
  mutate(zeros = ifelse(explanatory == 0, 0, 1)

my_model <- brm(
  bf(response ~ zeros +  explanatory:zeros),
  data = my_data
)

One now has a “global” intercept that applies to the explanatory component of the model and an intercept that applies just to the circumsance where the explanatory is equal to 0.

1 Like