# Learning how to write a Zero Inflated Poisson model with stan?

I’m trying to learn how to write a ZIP model with stan using as a starting point the UCLA fishing data set and following section 13.7 of the manual:

``````zinb <- read.csv("https://stats.idre.ucla.edu/stat/data/fish.csv") %>%
mutate(nofish = factor(nofish),
livebait = factor(livebait),
camper = factor(camper))

summary(zinb)
nofish  livebait camper     persons          child             xb                  zg              count
0:176   0: 34    0:103   Min.   :1.000   Min.   :0.000   Min.   :-3.275050   Min.   :-5.6259   Min.   :  0.000
1: 74   1:216    1:147   1st Qu.:2.000   1st Qu.:0.000   1st Qu.: 0.008267   1st Qu.:-1.2527   1st Qu.:  0.000
Median :2.000   Median :0.000   Median : 0.954550   Median : 0.6051   Median :  0.000
Mean   :2.528   Mean   :0.684   Mean   : 0.973796   Mean   : 0.2523   Mean   :  3.296
3rd Qu.:4.000   3rd Qu.:1.000   3rd Qu.: 1.963855   3rd Qu.: 1.9932   3rd Qu.:  2.000
Max.   :4.000   Max.   :3.000   Max.   : 5.352674   Max.   : 4.2632   Max.   :149.000
``````

This is the stan code that I wrote:

``````
data {
int<lower=0> N;
int<lower=0> K;
matrix[N, K] x;
int<lower=0> y[N];
}

parameters {
vector[K] beta_theta;                 //
vector[K] beta_lambda;                 //
}

transformed parameters {
vector[N] theta ;
vector[N] lambda ;
theta =  inv_logit(x * beta_theta) ; //<lower=0, upper=1>
lambda =  exp(x * beta_lambda) ; //<lower=0>
}

model {
beta_theta ~ normal(0,1) ;
beta_lambda ~ normal(0,1) ;
for(n in 1:N){
if(y[n] == 0)
target += log_sum_exp(bernoulli_lpmf(1 | theta[n]),
bernoulli_lpmf(0 | theta[n])
+ poisson_lpmf(y[n] | lambda[n]));
else
target += bernoulli_lpmf(0 | theta[n])
+ poisson_lpmf(y[n] | lambda[n]);
}
}

generated quantities{
int<lower=0> y_sim[N];
int zero;
for(n in 1:N){
zero=bernoulli_rng(theta[n]);
y_sim[n]=(1-zero)*poisson_log_rng(lambda[n]);
}
}
``````

Alas, when I try to fit the model I get the following error:

``````x <- zinb %>% select(nofish:child)

stan_data <- list(N = nrow(zinb),
K = ncol(x),
x = x,
y = zinb\$count)

fit2 <- fit <- sampling(model2, data = stan_data)

Chain 2: Exception: poisson_log_rng: Log rate parameter is 27.6657, but must be less than 20.7944  (in 'model402a5cf9a3_stan_403b6ccff3' at line 41)

``````

My guess is that I’m making a mistake when specifying theta, lambda, and their respective priors. Could you point me in the right direction?

replace with

poisson_rng(lambda[n]);

BTW, It’s better to work on log-scale, that would require substitute poisson_lmpf with posson_log_lmpf
and use x * beta_lambda instead of exp(x * beta_lambda).

thanks @andre.pfeuffer . I modified my code to use the log-scale and it seems to work now:

``````data {
int<lower=0> N;
int<lower=0> K;
matrix[N, K] x;
int<lower=0> y[N];
}

parameters {
vector[K] beta_theta;                 //
vector[K] beta_lambda;                 //
real alpha_theta;
real alpha_lambda;
}

transformed parameters {
vector[N] theta ;
vector[N] lambda ;
theta =  inv_logit(alpha_theta + x * beta_theta) ; //<lower=0, upper=1>
lambda =  alpha_lambda + x * beta_lambda ; // how do i make sure that this is >0 ??
}

model {
beta_theta ~ normal(0,1) ;
beta_lambda ~ normal(0,1) ;
alpha_theta ~ normal(0,1) ;
alpha_lambda ~ normal(0,1) ;
for(n in 1:N){
if(y[n] == 0)
target += log_sum_exp(bernoulli_lpmf(1 | theta[n]),
bernoulli_lpmf(0 | theta[n])
+ poisson_log_lpmf(y[n] | lambda[n]));
else
target += bernoulli_lpmf(0 | theta[n])
+ poisson_log_lpmf(y[n] | lambda[n]);
}
}

generated quantities{
int<lower=0> y_sim[N];
int zero;
for(n in 1:N){
zero=bernoulli_rng(theta[n]);
y_sim[n]=(1-zero)*poisson_log_rng(lambda[n]);
}
}
``````

Do I need to do something to restrict `lambda` to be greater than 0 with this parametarisation?

This is lambda before apply the (inverse-)link function.

lambda_log = alpha_lambda + x * beta_lambda ;
lambda = exp(lambda_log);

1 Like

I have one more question. I want to add the log-likelihood so i can compare models with loo. This is what i did:

``````data {
int<lower=0> N;
int<lower=0> K;
matrix[N, K] x;
int<lower=0> y[N];
}

parameters {
vector[K] beta_theta;                 //
vector[K] beta_lambda;                 //
real alpha_theta;
real alpha_lambda;
}

transformed parameters {
vector[N] theta ;
vector[N] lambda_log ;
theta =  inv_logit(alpha_theta + x * beta_theta) ;
lambda_log =  alpha_lambda + x * beta_lambda ;
}

model {
beta_theta ~ normal(0,1) ;
beta_lambda ~ normal(0,1) ;
alpha_theta ~ normal(0,1) ;
alpha_lambda ~ normal(0,1) ;
for(n in 1:N){
if(y[n] == 0)
target += log_sum_exp(bernoulli_lpmf(1 | theta[n]),
bernoulli_lpmf(0 | theta[n])
+ poisson_log_lpmf(y[n] | lambda_log[n]));
else
target += bernoulli_lpmf(0 | theta[n])
+ poisson_log_lpmf(y[n] | lambda_log[n]);
}
}

generated quantities{
real log_lik[N];
int<lower=0> y_sim[N];
int zero;
for(n in 1:N){
zero=bernoulli_rng(theta[n]);
y_sim[n]=(1-zero)*poisson_log_rng(lambda_log[n]);
if(y[n] == 0)
log_lik[n] = log_sum_exp(bernoulli_lpmf(1 | theta[n]),  // IS THIS RIGHT?
bernoulli_lpmf(0 | theta[n])
+ poisson_log_lpmf(y[n] | lambda_log[n])) ;
else
log_lik[n] = bernoulli_lpmf(0 | theta[n])
+ poisson_log_lpmf(y[n] | lambda_log[n]);
}
}

``````

Is what i did for `log_lik` correct?

1 Like

Your `log_lik` should look like your `target +=` statement(s) in your case, which it does. So I think this is correct.
However, `y_sim` needs some refinement. As I understand the zero-inflated hopefully correct,
with theta probability of a ‘0’ event a bernoulli is sampled.
If not “(1-theta) probability”, the possion is sampled.

It is beneficial for debugging to calculate the event probabilities up to a reasonal level, 0, 1, 2, … n and check this probability with the sampling.

Could you show me how to implement this? I based what I did on this post. This is how i’m reading my code:

1. Observation `n` will be zero with probability `theta[n]`: `zero=bernoulli_rng(theta[n]);`
2. If observation n is not zero, then it will draw a value from the Poisson: `y_sim[n]=(1-zero)*poisson_log_rng(lambda_log[n])`

I don’t want to comment another post. From my understanding the answer in stackexchange is correct.

To simulate the distribution, you can either do it manually with

``````ifelse(rbinom(n, size = 1, prob = p) > 0, 0, rpois(n, lambda = lambda))
``````

Two choices to get ‘0’.

``````if(bernoulli_rng(theta[n]) == 1) {
y_sim[n] = 0;
} else {
y_sim[n] = poisson_log_rng(lambda_log[n]);
}``````

``````if(bernoulli_rng(theta[n]) == 1) {
y_sim[n] = 0;
} else {
y_sim[n] = poisson_log_rng(lambda_log[n]);
}
``````

is equivalent to the code I wrote:

``````zero=bernoulli_rng(theta[n]);
y_sim[n]=(1-zero)*poisson_log_rng(lambda_log[n])
``````

if `zero==1` then `y_sim[n]=0` else `y_sim[n]=poisson_log_rng(lambda_log[n])`

Am I missing something??

1 Like

My bad, it’s all fine then.

@andre.pfeuffer thanks a lot for all the help!