Vectorizing a Zero Inflated Poisson Model

Is there a way to vectorize the simple Zero-Inflated Poisson model included in the Stan Reference Manual (page 197 version 2.16.0)? More specifically, I wonder if there is a way to group zero observations and non-zero observations such that we can use vectorized sampling statements.

Here is the model included in the reference manual:

data {
  int<lower=0> N;
  int<lower=0> y[N];

parameters {
  real<lower=0, upper=1> theta;
  real<lower=0> lambda;

model {
  for (n in 1:N) {
    if (y[n] == 0)
      target += log_sum_exp(bernoulli_lpmf(1 | theta),
                            bernoulli_lpmf(0 | theta)
                            + poisson_lpmf(y[n] | lambda));
        target += bernoulli_lpmf(0 | theta)
                  + poisson_lpmf(y[n] | lambda);

I don’t think so. What you wrote should be about as fast as it gets.

Look at the section “13.5. Vectorizing Mixtures.” It has to do with how the _lpdf functions interpret vectors.

You could probably avoid some if statements if you sorted y[n], or got really crafty with stuff, but regular old Stan loops are pretty fast (the convert directly to C++). Vectorization gets more important when you have lots of matrix/vector stuff happening.

Lots of ways to optimize this program. The easiest would be to store bernoull_lpmf(1 | theta) and bernoulli_lpmf(0 | theta) in two local variables and reuse. That’ll save all the redundant log(theta) and log1m(theta) calculations.

quote=“bbbales2, post:2, topic:1953”]
You could probably avoid some if statements if you sorted y[n],

Yup—you could use sufficient statistics. If you let N_y_eq_zero be the number of zero entries in y (i.e., sum(y == 0) in R, but you can also calculate it in transformed data in the Stan program [though not as easily—it’ll need a loop]).

target += N_y_eq_zero * log_sum_exp(bernoulli_lpmf(1 | theta),
                                    bernoulli_lpmf(0 | theta)
                                    + poisson_lpmf(y[n] | lambda));

Then, create an array y_non_zero containing the non-zero entries, and use

target += size(y_non_zero) * bernoulli_lpmf(0 | theta)
          + poisson_lpmf(y_non_zero | lambda);

You could then do what Ben suggests and consolidate some of the poisson_lpmf calculations. The data structure would be something like the following, calculated in transformed data

int<lower=0> K;
int<lower=1> y_suff[K];
int<lower=1> count_y[K];  

Then you have to figure out how to multiply this all through. Again, this may not be faster without unfolding the poisson_lpmf itself to avoid redundant calculations.