# Zero-Inflated models in Stan

I think this is confusing the hurdle model and the zero-inflated model. Here they are without the log(0.001) modification.

#### Zero inflated normal

This allows 0 to be generated either by the discrete model or the continuous model.

target += log_sum_exp(bernoulli_lpmf(1 | theta),
bernoulli_lpmf(0 | theta)
+ normal_lpdf(y[n] | mu, sigma));


#### Hurdle normal

This is pure mixture of discrete and continuous.

if (y[n] == 0)
target += bernoulli_lpmf(1 | theta);
else
target += bernoulli_lpmf(0 | theta)
+ normal_lpdf(y[n] | mu, sigma);

2 Likes

You are right.

If the generative processes are fully independent and do not share any parameters (e.g. covarying random effects) then it’s fine to model them separately, and can even be preferable since if there are problems in the posterior geometry separating the models will easily and definitively localize those problems to one side or the other. A potential advantage to fitting jointly is that, if using brms or similar, you get all the machinery to predict the response in one step.

2 Likes

Yes!

I agree that the continuous data looks a bit heavy-tailed here. I might start with a Student-t observational model with a relatively conservative prior on \nu^{-1}, such as \text{normal}(\nu^{-1} \mid 0, 0.11) that keeps \nu above 4-ish.

The separation here is a consequence of mixing discrete and continuous processes. Consider what would happen for two discrete processes, such as zero-inflating a Poisson model. Here the observation y = 0 has a non-zero probability of arising from both models so we can’t precisely assign that observation to one model or the other. Instead we have to fit the joint model that allows for the possibility that the zero came from either component at the same time.

That’s the difference between zero-inflation and the hurdle model. With zero-inflated Poisson, there are two potential sources of a zero—the Poisson or the inflation. With the hurdle model, the zero always comes from the zero component of the mixture.