# Vectorizing a Zero Inflated Poisson Model

Is there a way to vectorize the simple Zero-Inflated Poisson model included in the Stan Reference Manual (page 197 version 2.16.0)? More specifically, I wonder if there is a way to group zero observations and non-zero observations such that we can use vectorized sampling statements.

Here is the model included in the reference manual:

``````data {
int<lower=0> N;
int<lower=0> y[N];
}

parameters {
real<lower=0, upper=1> theta;
real<lower=0> lambda;
}

model {
for (n in 1:N) {
if (y[n] == 0)
target += log_sum_exp(bernoulli_lpmf(1 | theta),
bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y[n] | lambda));
else
target += bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y[n] | lambda);
}
}
``````

I don’t think so. What you wrote should be about as fast as it gets.

Look at the section “13.5. Vectorizing Mixtures.” It has to do with how the _lpdf functions interpret vectors.

You could probably avoid some if statements if you sorted y[n], or got really crafty with stuff, but regular old Stan loops are pretty fast (the convert directly to C++). Vectorization gets more important when you have lots of matrix/vector stuff happening.

Lots of ways to optimize this program. The easiest would be to store `bernoull_lpmf(1 | theta)` and `bernoulli_lpmf(0 | theta)` in two local variables and reuse. That’ll save all the redundant `log(theta)` and `log1m(theta)` calculations.

quote=“bbbales2, post:2, topic:1953”]
You could probably avoid some if statements if you sorted y[n],
[/quote]

Yup—you could use sufficient statistics. If you let `N_y_eq_zero` be the number of zero entries in `y` (i.e., `sum(y == 0)` in R, but you can also calculate it in transformed data in the Stan program [though not as easily—it’ll need a loop]).

``````target += N_y_eq_zero * log_sum_exp(bernoulli_lpmf(1 | theta),
bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y[n] | lambda));
``````

Then, create an array `y_non_zero` containing the non-zero entries, and use

``````target += size(y_non_zero) * bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y_non_zero | lambda);
``````

You could then do what Ben suggests and consolidate some of the `poisson_lpmf` calculations. The data structure would be something like the following, calculated in transformed data

``````int<lower=0> K;
int<lower=1> y_suff[K];
int<lower=1> count_y[K];
``````

Then you have to figure out how to multiply this all through. Again, this may not be faster without unfolding the `poisson_lpmf` itself to avoid redundant calculations.

3 Likes