How to speed up estimation - zero inflated poisson?

I am trying to estimate a zero-inflated poison model. This is the model code I am using.

data {
  int<lower=0> N;
  int<lower=0> K; // Number of predictors in zero inflation model
  int<lower=0> J;// Number of predictors in count model
  matrix[N,K] x1;// data for logit model
  matrix[N,J] x2;// data for count model
  int<lower=0> y[N];//count variable
}
parameters {
  vector[K] beta_theta;                 
  vector[J] beta_lambda;                 
}
transformed parameters {
  vector[N] theta;
  vector[N] lambda ;
  theta =  inv_logit(x1 * beta_theta) ; 
  lambda =  exp(x2 * beta_lambda) ;
}
model {
  beta_theta ~ normal(0,100);
  beta_lambda ~ normal(0,100);
  for (n in 1:N) {
    if (y[n] == 0)
      target += log_sum_exp(bernoulli_lpmf(1 | theta),
                            bernoulli_lpmf(0 | theta)
                              + poisson_lpmf(y[n] | lambda));
    else
      target += bernoulli_lpmf(0 | theta)
                  + poisson_lpmf(y[n] | lambda);
  }
}


I am wondering if there is any way to speed it up since the gradient is quite high

Any and all help is appreciated!

1 Like

I think it looks generally fine, I guess the ‘if y=0’ slows it down.

One risk is the wide prior on beta, exp(100 * …) can be a large number and that can lead to overflow.

1 Like

So do you suggest using a stronger prior like normal(0,10). Will that help ?

1 Like

The Stan manual has a section on speeding up the estimation for zero-inflated model. Basically you would break up the data into the zeros and non-zeros, and vectorizing the corresponding likelihood calculations. Give that a try. Best of luck

1 Like

Looks like you are fitting a very large dataset - what is your N,K,J? Before optimizing too hard, I would try to fit a subset of the data (potentially filtering for rows that have a fixed value of a predictor and thus also reducing the model) and check that the model actually fits the data well and has no convergence problems. It is always frustrating to burn a lot of time and power on a model that ends up being discarded. Once you are sure the model is OK (including priors as @Dirk_Nachbar suggests), @sonicking 's suggestion is likely to help and you might also benefit from within-chain paralellization with reduce_sum if you have more cores to throw at the problem.

Best of luck with your model!

@martinmodrak My dataset is in fact very large 623,438 observations. But in this case, I was testing the model on a subset of 5,000 observations. So here N is 5,000, K is 3 and J is 15. The gradient times are large despite the subset. I tried fitting stronger priors such as normal(0,10) and normal(0,5) but it only increased the gradient time. So am not sure. I haven’t yet tried the optimized code where zeros and non-zeros are handled separately.

I will have a look at reduce_sum as well.

Thanks!

That’s still potentially quite large, so I would definitely try to prune it more. And I would check for issues with the model. Is the scale of all predictors sensible? Centering / scaling predictors can help. Have you ruled out that your predictors are collinear / almost collinear? If this is an issue, the QR reparametrization might also help: 1.2 The QR reparameterization | Stan User’s Guide