Let me preface my reply by noting that hurdle models wrap a lot of really subtle probability theory, which is why they can be so confusing once you look a little deeper.

Also keep in mind that in Stan each program specifies a joint density function \pi(y, \theta) where y are the data and \theta are the parameters; a term like “hurdle model” isn’t particularly meaningful until you can write it out as Stan code. This can be complicated when popular terms actually specify ill-defined models and hence they can’t necessarily be written as Stan programs as easily as one might think! All of this we should expect models like this to be subtle.

The ultimate problem is that you can’t compare *probability densities*, which define continuous models, to *probabilities*, which define discrete models. When trying to mix continuous outcomes with discrete outcomes (i.e. inflation components) you have to be careful to compare apples to apples.

This means comparing probabilities to probabilities. For a discrete model the probability of an outcome is easy – it’s must the probability mass assigned to a point. For zero inflation that would be all of the probability at zero, P[y = 0] = 1. But for a continuous model the probability of *any* given point is zero! On continuous spaces we can assign finite probability essentially only to *intervals*. For a continuous baseline model that means P[y = 0] = 0. Consequently if you ask what *probability* the two components assign to y = 0 then the discrete inflation model will always assign *infinitely more* than the continuous baseline model, and if you observe y = 0 then you know it had to have come from the inflation model.

In Stan we can implement a zero-hurdle with a conditional statement,

```
if (y[n] == 0) {
// Contributions only from inflated component
target += log(lambda);
} else {
// Contribution only from the baseline component
real lpdf = baseline_lpdf(y[n] | theta);
target += log(1 - lambda) + lpdf;
}
```

Because each branch considers distinct observations (`y[n] == 0`

for the first branch and `y[n] != 0`

for the second branch) we can loop over them independently. In particular because the contribution from each zero observation is the same we can write the first branch as

```
target += N_zero * log(lambda);
```

where `N_zero`

is the number of zero observations.

Likewise we can simplify the second branch to

```
target += (N - N_zero) * log(1 - lambda);
for (n in 1:N)
if (y[n] != 0) {
real lpdf = baseline_lpdf(y[n] | theta);
target += lpdf;
}
}
```

The entire model is then

```
target += N_zero * log(lambda);
target += (N - N_zero) * log(1 - lambda);
for (n in 1:N)
if (y[n] != 0) {
real lpdf = baseline_lpdf(y[n] | theta);
target += lpdf;
}
}
```

Now those first two expressions simplify:

```
target += N_zero * log(lambda);
target += (N - N_zero) * log(1 - lambda);
```

is equivent to

```
target += N_zero * log(lambda) + (N - N_zero) * log(1 - lambda);
```

which is equivalent to

```
target += log( lambda^{N_zero} * (1 - lambda)^{(N - N_zero)} );
```

or

```
target += binomial_lpdf(lambda, N, N_zero);
```

In other words because y = 0 and y != 0 are modeled by *separate* data generating processes the entire model decomposes into a binomial model for the total number of inflated observations and a baseline model for the non-zero observations.

Again if you’re careful with the math then there is no difference between how one would implement a hurdle model or a continuous-inflated-by-zero model in Stan. What I’m referring to is that one can model everything

```
target += binomial_lpdf(lambda, N, N_zero);
for (n in 1:N)
if (y[n] != 0) {
real lpdf = baseline_lpdf(y[n] | theta);
target += lpdf;
}
}
```

or just the zero observations

```
target += binomial_lpdf(lambda, N, N_zero);
```

or just the non-zero observations

```
for (n in 1:N)
if (y[n] != 0) {
real lpdf = baseline_lpdf(y[n] | theta);
target += lpdf;
}
}
```