Learning how to write a Zero Inflated Poisson model with stan?

ignacio · November 25, 2018, 8:41pm

I’m trying to learn how to write a ZIP model with stan using as a starting point the UCLA fishing data set and following section 13.7 of the manual:

zinb <- read.csv("https://stats.idre.ucla.edu/stat/data/fish.csv") %>% 
  mutate(nofish = factor(nofish),
         livebait = factor(livebait),
         camper = factor(camper))

summary(zinb)
 nofish  livebait camper     persons          child             xb                  zg              count        
 0:176   0: 34    0:103   Min.   :1.000   Min.   :0.000   Min.   :-3.275050   Min.   :-5.6259   Min.   :  0.000  
 1: 74   1:216    1:147   1st Qu.:2.000   1st Qu.:0.000   1st Qu.: 0.008267   1st Qu.:-1.2527   1st Qu.:  0.000  
                          Median :2.000   Median :0.000   Median : 0.954550   Median : 0.6051   Median :  0.000  
                          Mean   :2.528   Mean   :0.684   Mean   : 0.973796   Mean   : 0.2523   Mean   :  3.296  
                          3rd Qu.:4.000   3rd Qu.:1.000   3rd Qu.: 1.963855   3rd Qu.: 1.9932   3rd Qu.:  2.000  
                          Max.   :4.000   Max.   :3.000   Max.   : 5.352674   Max.   : 4.2632   Max.   :149.000

This is the stan code that I wrote:


data {
  int<lower=0> N;
  int<lower=0> K; 
  matrix[N, K] x; 
  int<lower=0> y[N];
}

parameters {
  vector[K] beta_theta;                 //
  vector[K] beta_lambda;                 //
}

transformed parameters {
  vector[N] theta ;
  vector[N] lambda ;
  theta =  inv_logit(x * beta_theta) ; //<lower=0, upper=1>
  lambda =  exp(x * beta_lambda) ; //<lower=0>
}


model {
  beta_theta ~ normal(0,1) ; 
  beta_lambda ~ normal(0,1) ; 
  for(n in 1:N){
    if(y[n] == 0)
      target += log_sum_exp(bernoulli_lpmf(1 | theta[n]),
                            bernoulli_lpmf(0 | theta[n])
                              + poisson_lpmf(y[n] | lambda[n]));
    else
      target += bernoulli_lpmf(0 | theta[n])
                  + poisson_lpmf(y[n] | lambda[n]);
  }
}

generated quantities{
  int<lower=0> y_sim[N];
  int zero;
  for(n in 1:N){
	     zero=bernoulli_rng(theta[n]);
	     y_sim[n]=(1-zero)*poisson_log_rng(lambda[n]);
  }
}

Alas, when I try to fit the model I get the following error:

x <- zinb %>% select(nofish:child)

stan_data <- list(N = nrow(zinb),
                  K = ncol(x),
                  x = x,
                  y = zinb$count)

fit2 <- fit <- sampling(model2, data = stan_data)

Chain 2: Exception: poisson_log_rng: Log rate parameter is 27.6657, but must be less than 20.7944  (in 'model402a5cf9a3_stan_403b6ccff3' at line 41)

My guess is that I’m making a mistake when specifying theta, lambda, and their respective priors. Could you point me in the right direction?

andre.pfeuffer · November 26, 2018, 12:53am

replace with

poisson_rng(lambda[n]);

BTW, It’s better to work on log-scale, that would require substitute poisson_lmpf with posson_log_lmpf
and use x * beta_lambda instead of exp(x * beta_lambda).

ignacio · November 26, 2018, 1:06am

thanks @andre.pfeuffer . I modified my code to use the log-scale and it seems to work now:

data {
  int<lower=0> N;
  int<lower=0> K; 
  matrix[N, K] x; 
  int<lower=0> y[N];
}

parameters {
  vector[K] beta_theta;                 //
  vector[K] beta_lambda;                 //
  real alpha_theta;
  real alpha_lambda;
}

transformed parameters {
  vector[N] theta ;
  vector[N] lambda ;
  theta =  inv_logit(alpha_theta + x * beta_theta) ; //<lower=0, upper=1>
  lambda =  alpha_lambda + x * beta_lambda ; // how do i make sure that this is >0 ??
}


model {
  beta_theta ~ normal(0,1) ; 
  beta_lambda ~ normal(0,1) ; 
  alpha_theta ~ normal(0,1) ;
  alpha_lambda ~ normal(0,1) ;
  for(n in 1:N){
    if(y[n] == 0)
      target += log_sum_exp(bernoulli_lpmf(1 | theta[n]),
                            bernoulli_lpmf(0 | theta[n])
                              + poisson_log_lpmf(y[n] | lambda[n]));
    else
      target += bernoulli_lpmf(0 | theta[n])
                  + poisson_log_lpmf(y[n] | lambda[n]);
  }
}

generated quantities{
  int<lower=0> y_sim[N];
  int zero;
  for(n in 1:N){
	     zero=bernoulli_rng(theta[n]);
	     y_sim[n]=(1-zero)*poisson_log_rng(lambda[n]);
  }
}

Do I need to do something to restrict lambda to be greater than 0 with this parametarisation?

andre.pfeuffer · November 26, 2018, 1:29am

This is lambda before apply the (inverse-)link function.

lambda_log = alpha_lambda + x * beta_lambda ;
lambda = exp(lambda_log);

ignacio · November 29, 2018, 3:34pm

I have one more question. I want to add the log-likelihood so i can compare models with loo. This is what i did:

data {
  int<lower=0> N;
  int<lower=0> K; 
  matrix[N, K] x; 
  int<lower=0> y[N];
}

parameters {
  vector[K] beta_theta;                 //
  vector[K] beta_lambda;                 //
  real alpha_theta;
  real alpha_lambda;
}

transformed parameters {
  vector[N] theta ;
  vector[N] lambda_log ;
  theta =  inv_logit(alpha_theta + x * beta_theta) ; 
  lambda_log =  alpha_lambda + x * beta_lambda ; 
}


model {
  beta_theta ~ normal(0,1) ; 
  beta_lambda ~ normal(0,1) ; 
  alpha_theta ~ normal(0,1) ;
  alpha_lambda ~ normal(0,1) ;
  for(n in 1:N){
    if(y[n] == 0)
      target += log_sum_exp(bernoulli_lpmf(1 | theta[n]),
                            bernoulli_lpmf(0 | theta[n])
                              + poisson_log_lpmf(y[n] | lambda_log[n]));
    else
      target += bernoulli_lpmf(0 | theta[n])
                  + poisson_log_lpmf(y[n] | lambda_log[n]);
  }
}

generated quantities{
  real log_lik[N];
  int<lower=0> y_sim[N];
  int zero;
  for(n in 1:N){
	     zero=bernoulli_rng(theta[n]);
	     y_sim[n]=(1-zero)*poisson_log_rng(lambda_log[n]);
	     if(y[n] == 0)
	      log_lik[n] = log_sum_exp(bernoulli_lpmf(1 | theta[n]),  // IS THIS RIGHT?
                            bernoulli_lpmf(0 | theta[n])
                              + poisson_log_lpmf(y[n] | lambda_log[n])) ;
       else
        log_lik[n] = bernoulli_lpmf(0 | theta[n])
                      + poisson_log_lpmf(y[n] | lambda_log[n]);
  }
}

Is what i did for log_lik correct?

andre.pfeuffer · November 30, 2018, 2:40am

Your log_lik should look like your target += statement(s) in your case, which it does. So I think this is correct.
However, y_sim needs some refinement. As I understand the zero-inflated hopefully correct,
with theta probability of a ‘0’ event a bernoulli is sampled.
If not “(1-theta) probability”, the possion is sampled.

It is beneficial for debugging to calculate the event probabilities up to a reasonal level, 0, 1, 2, … n and check this probability with the sampling.

ignacio · November 30, 2018, 1:40pm

Could you show me how to implement this? I based what I did on this post. This is how i’m reading my code:

Observation n will be zero with probability theta[n]: zero=bernoulli_rng(theta[n]);
If observation n is not zero, then it will draw a value from the Poisson: y_sim[n]=(1-zero)*poisson_log_rng(lambda_log[n])

andre.pfeuffer · November 30, 2018, 2:29pm

I don’t want to comment another post. From my understanding the answer in stackexchange is correct.

To simulate the distribution, you can either do it manually with

ifelse(rbinom(n, size = 1, prob = p) > 0, 0, rpois(n, lambda = lambda))

Two choices to get ‘0’.

if(bernoulli_rng(theta[n]) == 1) {
  y_sim[n] = 0;
} else {
  y_sim[n] = poisson_log_rng(lambda_log[n]);
}

ignacio · November 30, 2018, 2:37pm

I think your code:

if(bernoulli_rng(theta[n]) == 1) {
  y_sim[n] = 0;
} else {
  y_sim[n] = poisson_log_rng(lambda_log[n]);
}

is equivalent to the code I wrote:

zero=bernoulli_rng(theta[n]);
y_sim[n]=(1-zero)*poisson_log_rng(lambda_log[n])

if zero==1 then y_sim[n]=0 else y_sim[n]=poisson_log_rng(lambda_log[n])

Am I missing something??

andre.pfeuffer · November 30, 2018, 2:41pm

My bad, it’s all fine then.

ignacio · November 30, 2018, 2:42pm

@andre.pfeuffer thanks a lot for all the help!

Topic		Replies	Views
Custom Likelihood for Zero-inflated Lognormal Model Modeling	5	2730	May 20, 2020
Zero one inflated beta regression in STAN Modeling	18	6599	January 14, 2020
Forward algorithm for ZIP - Hidden Markov model Modeling fitting-issues , specification , hmm	1	1359	February 23, 2019
Double inflated models in Stan Modeling	16	981	October 13, 2018
Prior posterior distribution for Zero inflated Poisson model with Random Effects Modeling	12	2846	November 6, 2017

Learning how to write a Zero Inflated Poisson model with stan?

Related topics