# Vectorizing a Zero Inflated Poisson Model

Is there a way to vectorize the simple Zero-Inflated Poisson model included in the Stan Reference Manual (page 197 version 2.16.0)? More specifically, I wonder if there is a way to group zero observations and non-zero observations such that we can use vectorized sampling statements.

Here is the model included in the reference manual:

data {
int<lower=0> N;
int<lower=0> y[N];
}

parameters {
real<lower=0, upper=1> theta;
real<lower=0> lambda;
}

model {
for (n in 1:N) {
if (y[n] == 0)
target += log_sum_exp(bernoulli_lpmf(1 | theta),
bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y[n] | lambda));
else
target += bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y[n] | lambda);
}
}

I donâ€™t think so. What you wrote should be about as fast as it gets.

Look at the section â€ś13.5. Vectorizing Mixtures.â€ť It has to do with how the _lpdf functions interpret vectors.

You could probably avoid some if statements if you sorted y[n], or got really crafty with stuff, but regular old Stan loops are pretty fast (the convert directly to C++). Vectorization gets more important when you have lots of matrix/vector stuff happening.

Lots of ways to optimize this program. The easiest would be to store bernoull_lpmf(1 | theta) and bernoulli_lpmf(0 | theta) in two local variables and reuse. Thatâ€™ll save all the redundant log(theta) and log1m(theta) calculations.

quote=â€śbbbales2, post:2, topic:1953â€ť]
You could probably avoid some if statements if you sorted y[n],
[/quote]

Yupâ€”you could use sufficient statistics. If you let N_y_eq_zero be the number of zero entries in y (i.e., sum(y == 0) in R, but you can also calculate it in transformed data in the Stan program [though not as easilyâ€”itâ€™ll need a loop]).

target += N_y_eq_zero * log_sum_exp(bernoulli_lpmf(1 | theta),
bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y[n] | lambda));

Then, create an array y_non_zero containing the non-zero entries, and use

target += size(y_non_zero) * bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y_non_zero | lambda);

You could then do what Ben suggests and consolidate some of the poisson_lpmf calculations. The data structure would be something like the following, calculated in transformed data

int<lower=0> K;
int<lower=1> y_suff[K];
int<lower=1> count_y[K];

Then you have to figure out how to multiply this all through. Again, this may not be faster without unfolding the poisson_lpmf itself to avoid redundant calculations.

3 Likes