Why are these two similar Zero-inflated Poisson models producing quite different results?


#1

I currently have values generated from a zero-inflated poisson model:

df <- data.frame(value=c(rpois(1000, 4), rep(0L, 1000)))  #mixture probability is 0.5

I then summarize these to produce a histogram; each row represents the value and its corresponding count:

summarized_df <- df %>% group_by(value) %>% summarize(count=n())

I use a Poisson zero-inflated model for the the probability mass function, , as such:

data {
  int<lower=1> N; // rows of data
  int<lower=0> value[N];
  int<lower=0> count[N];
}

parameters {
  real<lower=0, upper=1> theta; // probability it is zero
  real<lower=0> lambda; // poisson
}

model {
  theta ~ beta(0.1, 0.1);
  lambda ~ gamma(0.1, 0.1);
  for (n in 1:N) {
    if (value[n] == 0)  {
      target += count[n] * (log_sum_exp(bernoulli_lpmf(1 | theta),
                            bernoulli_lpmf(0 | theta)
                            + poisson_lpmf(value[n] | lambda)));

    } else {
      target += count[n] * (bernoulli_lpmf(0 | theta)
                  + poisson_lpmf(value[n] | lambda));
    }
  }
}

This produces sensible results:

           mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
theta      0.50    0.00 0.01     0.48     0.49     0.50     0.51     0.52  3860    1
lambda     4.00    0.00 0.07     3.87     3.96     4.00     4.05     4.13  4000    1

However, when I attempt to iterate through the counts instead, the results get all wonky:

data {
  int<lower=1> N; // rows of data
  int<lower=0> value[N];
  int<lower=0> count[N];
}

parameters {
  real<lower=0, upper=1> theta;
  real<lower=0> lambda; 
}

model {
  theta ~ beta(1, 1);
  lambda ~ gamma(1, 1);
  for (n in 1:N) {
    int C;
    C = count[N];
    for (idx in 1:C) {
      if (value[n] == 0)  {
        target += log_sum_exp(bernoulli_lpmf(1 | theta), 
              bernoulli_lpmf(0 | theta) + poisson_lpmf(value[n] | lambda));
      } else {
        target += bernoulli_lpmf(0 | theta) + poisson_lpmf(value[n] | lambda);
      }
    }
  }
}

         mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
theta    0.11    0.00 0.06   0.02   0.07   0.10   0.15   0.26  2917    1
lambda   5.75    0.01 0.51   4.80   5.39   5.74   6.09   6.76  2705    1

Sorry if I’m missing something completely obvious, but my suspicion was that these models would produce roughly the same results.


#2

In the second model you probably meant:

C = count[N];

to be

C = count[n];

But this is just replacing multiplication with a for loop, which doesn’t seem ideal?

Also your priors are different in the two models. That might affect things. Hope that helps!