Vectorizing a Zero Inflated Poisson Model

pauper42 · September 20, 2017, 12:13am

Is there a way to vectorize the simple Zero-Inflated Poisson model included in the Stan Reference Manual (page 197 version 2.16.0)? More specifically, I wonder if there is a way to group zero observations and non-zero observations such that we can use vectorized sampling statements.

Here is the model included in the reference manual:

data {
  int<lower=0> N;
  int<lower=0> y[N];
}

parameters {
  real<lower=0, upper=1> theta;
  real<lower=0> lambda;
}

model {
  for (n in 1:N) {
    if (y[n] == 0)
      target += log_sum_exp(bernoulli_lpmf(1 | theta),
                            bernoulli_lpmf(0 | theta)
                            + poisson_lpmf(y[n] | lambda));
      else
        target += bernoulli_lpmf(0 | theta)
                  + poisson_lpmf(y[n] | lambda);
  }
}

bbbales2 · September 20, 2017, 12:30am

I don’t think so. What you wrote should be about as fast as it gets.

Look at the section “13.5. Vectorizing Mixtures.” It has to do with how the _lpdf functions interpret vectors.

You could probably avoid some if statements if you sorted y[n], or got really crafty with stuff, but regular old Stan loops are pretty fast (the convert directly to C++). Vectorization gets more important when you have lots of matrix/vector stuff happening.

Bob_Carpenter · September 21, 2017, 7:36pm

Lots of ways to optimize this program. The easiest would be to store bernoull_lpmf(1 | theta) and bernoulli_lpmf(0 | theta) in two local variables and reuse. That’ll save all the redundant log(theta) and log1m(theta) calculations.

quote=“bbbales2, post:2, topic:1953”]
You could probably avoid some if statements if you sorted y[n],
[/quote]

Yup—you could use sufficient statistics. If you let N_y_eq_zero be the number of zero entries in y (i.e., sum(y == 0) in R, but you can also calculate it in transformed data in the Stan program [though not as easily—it’ll need a loop]).

target += N_y_eq_zero * log_sum_exp(bernoulli_lpmf(1 | theta),
                                    bernoulli_lpmf(0 | theta)
                                    + poisson_lpmf(y[n] | lambda));

Then, create an array y_non_zero containing the non-zero entries, and use

target += size(y_non_zero) * bernoulli_lpmf(0 | theta)
          + poisson_lpmf(y_non_zero | lambda);

You could then do what Ben suggests and consolidate some of the poisson_lpmf calculations. The data structure would be something like the following, calculated in transformed data

int<lower=0> K;
int<lower=1> y_suff[K];
int<lower=1> count_y[K];

Then you have to figure out how to multiply this all through. Again, this may not be faster without unfolding the poisson_lpmf itself to avoid redundant calculations.

Topic		Replies	Views
Vectorization and vector unstacking Modeling	2	1313	November 27, 2017
Double inflated models in Stan Modeling	16	900	October 13, 2018
Zero one inflated beta regression in STAN Modeling	18	6407	January 14, 2020
Correct vectorization in hierarchical model Modeling techniques , specification	9	549	April 14, 2020
Vectorizing bernoulli_logit_lpmf() Algorithms	2	1617	July 12, 2017

Vectorizing a Zero Inflated Poisson Model

Related topics