Is there a way to vectorize the simple Zero-Inflated Poisson model included in the Stan Reference Manual (page 197 version 2.16.0)? More specifically, I wonder if there is a way to group zero observations and non-zero observations such that we can use vectorized sampling statements.
Here is the model included in the reference manual:
data {
int<lower=0> N;
int<lower=0> y[N];
}
parameters {
real<lower=0, upper=1> theta;
real<lower=0> lambda;
}
model {
for (n in 1:N) {
if (y[n] == 0)
target += log_sum_exp(bernoulli_lpmf(1 | theta),
bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y[n] | lambda));
else
target += bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y[n] | lambda);
}
}
I don’t think so. What you wrote should be about as fast as it gets.
Look at the section “13.5. Vectorizing Mixtures.” It has to do with how the _lpdf functions interpret vectors.
You could probably avoid some if statements if you sorted y[n], or got really crafty with stuff, but regular old Stan loops are pretty fast (the convert directly to C++). Vectorization gets more important when you have lots of matrix/vector stuff happening.
Lots of ways to optimize this program. The easiest would be to store bernoull_lpmf(1 | theta)
and bernoulli_lpmf(0 | theta)
in two local variables and reuse. That’ll save all the redundant log(theta)
and log1m(theta)
calculations.
quote=“bbbales2, post:2, topic:1953”]
You could probably avoid some if statements if you sorted y[n],
[/quote]
Yup—you could use sufficient statistics. If you let N_y_eq_zero
be the number of zero entries in y
(i.e., sum(y == 0)
in R, but you can also calculate it in transformed data in the Stan program [though not as easily—it’ll need a loop]).
target += N_y_eq_zero * log_sum_exp(bernoulli_lpmf(1 | theta),
bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y[n] | lambda));
Then, create an array y_non_zero
containing the non-zero entries, and use
target += size(y_non_zero) * bernoulli_lpmf(0 | theta)
+ poisson_lpmf(y_non_zero | lambda);
You could then do what Ben suggests and consolidate some of the poisson_lpmf
calculations. The data structure would be something like the following, calculated in transformed data
int<lower=0> K;
int<lower=1> y_suff[K];
int<lower=1> count_y[K];
Then you have to figure out how to multiply this all through. Again, this may not be faster without unfolding the poisson_lpmf
itself to avoid redundant calculations.
3 Likes