Zero-Inflated models in Stan

Bob_Carpenter · July 26, 2022, 8:30pm

I think this is confusing the hurdle model and the zero-inflated model. Here they are without the log(0.001) modification.

Zero inflated normal

This allows 0 to be generated either by the discrete model or the continuous model.

target += log_sum_exp(bernoulli_lpmf(1 | theta),
                      bernoulli_lpmf(0 | theta)
                        + normal_lpdf(y[n] | mu, sigma));

Hurdle normal

This is pure mixture of discrete and continuous.

if (y[n] == 0)
  target += bernoulli_lpmf(1 | theta);
else
  target += bernoulli_lpmf(0 | theta)
              + normal_lpdf(y[n] | mu, sigma);

jsocolar · July 27, 2022, 5:25pm

You are right.

If the generative processes are fully independent and do not share any parameters (e.g. covarying random effects) then it’s fine to model them separately, and can even be preferable since if there are problems in the posterior geometry separating the models will easily and definitively localize those problems to one side or the other. A potential advantage to fitting jointly is that, if using brms or similar, you get all the machinery to predict the response in one step.

betanalpha · August 15, 2022, 8:19pm

Yes!

I agree that the continuous data looks a bit heavy-tailed here. I might start with a Student-t observational model with a relatively conservative prior on \nu^{-1}, such as \text{normal}(\nu^{-1} \mid 0, 0.11) that keeps \nu above 4-ish.

The separation here is a consequence of mixing discrete and continuous processes. Consider what would happen for two discrete processes, such as zero-inflating a Poisson model. Here the observation y = 0 has a non-zero probability of arising from both models so we can’t precisely assign that observation to one model or the other. Instead we have to fit the joint model that allows for the possibility that the zero came from either component at the same time.

Bob_Carpenter · August 15, 2022, 8:23pm

That’s the difference between zero-inflation and the hurdle model. With zero-inflated Poisson, there are two potential sources of a zero—the Poisson or the inflation. With the hurdle model, the zero always comes from the zero component of the mixture.

Topic		Replies	Views
Hurdle lognormal distribution Modeling mixture	11	2130	October 9, 2020
Zero-inflated negative and positive data - Zero-Inflated Gaussian? Modeling specification	3	674	October 10, 2020
Zero one inflated beta regression in STAN Modeling	18	6273	January 14, 2020
Zero-inflated prior or mixture distreibution for prior in Stan General	5	669	April 22, 2022
Compound Poisson–gamma distribution Modeling	3	1836	September 22, 2019

Zero-Inflated models in Stan

Zero inflated normal

Hurdle normal

Related Topics