Is there a way to vectorize the simple Zero-Inflated Poisson model included in the Stan Reference Manual (page 197 version 2.16.0)? More specifically, I wonder if there is a way to group zero observations and non-zero observations such that we can use vectorized sampling statements.

Here is the model included in the reference manual:

```
data {
int<lower=0> N;
int<lower=0> y[N];
}
parameters {
real<lower=0, upper=1> theta;
real<lower=0> lambda;
}
model {
for (n in 1:N) {
if (y[n] == 0)
target += log_sum_exp(bernoulli_lpmf(1 | theta),
bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y[n] | lambda));
else
target += bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y[n] | lambda);
}
}
```

I donâ€™t think so. What you wrote should be about as fast as it gets.

Look at the section â€ś13.5. Vectorizing Mixtures.â€ť It has to do with how the _lpdf functions interpret vectors.

You could probably avoid some if statements if you sorted y[n], or got really crafty with stuff, but regular old Stan loops are pretty fast (the convert directly to C++). Vectorization gets more important when you have lots of matrix/vector stuff happening.

Lots of ways to optimize this program. The easiest would be to store `bernoull_lpmf(1 | theta)`

and `bernoulli_lpmf(0 | theta)`

in two local variables and reuse. Thatâ€™ll save all the redundant `log(theta)`

and `log1m(theta)`

calculations.

quote=â€śbbbales2, post:2, topic:1953â€ť]

You could probably avoid some if statements if you sorted y[n],

[/quote]

Yupâ€”you could use sufficient statistics. If you let `N_y_eq_zero`

be the number of zero entries in `y`

(i.e., `sum(y == 0)`

in R, but you can also calculate it in transformed data in the Stan program [though not as easilyâ€”itâ€™ll need a loop]).

```
target += N_y_eq_zero * log_sum_exp(bernoulli_lpmf(1 | theta),
bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y[n] | lambda));
```

Then, create an array `y_non_zero`

containing the non-zero entries, and use

```
target += size(y_non_zero) * bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y_non_zero | lambda);
```

You could then do what Ben suggests and consolidate some of the `poisson_lpmf`

calculations. The data structure would be something like the following, calculated in transformed data

```
int<lower=0> K;
int<lower=1> y_suff[K];
int<lower=1> count_y[K];
```

Then you have to figure out how to multiply this all through. Again, this may not be faster without unfolding the `poisson_lpmf`

itself to avoid redundant calculations.

3 Likes