I currently have values generated from a zero-inflated poisson model:
df <- data.frame(value=c(rpois(1000, 4), rep(0L, 1000))) #mixture probability is 0.5
I then summarize these to produce a histogram; each row represents the value and its corresponding count:
summarized_df <- df %>% group_by(value) %>% summarize(count=n())
I use a Poisson zero-inflated model for the the probability mass function, , as such:
data {
int<lower=1> N; // rows of data
int<lower=0> value[N];
int<lower=0> count[N];
}
parameters {
real<lower=0, upper=1> theta; // probability it is zero
real<lower=0> lambda; // poisson
}
model {
theta ~ beta(0.1, 0.1);
lambda ~ gamma(0.1, 0.1);
for (n in 1:N) {
if (value[n] == 0) {
target += count[n] * (log_sum_exp(bernoulli_lpmf(1 | theta),
bernoulli_lpmf(0 | theta)
+ poisson_lpmf(value[n] | lambda)));
} else {
target += count[n] * (bernoulli_lpmf(0 | theta)
+ poisson_lpmf(value[n] | lambda));
}
}
}
This produces sensible results:
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
theta 0.50 0.00 0.01 0.48 0.49 0.50 0.51 0.52 3860 1
lambda 4.00 0.00 0.07 3.87 3.96 4.00 4.05 4.13 4000 1
However, when I attempt to iterate through the counts instead, the results get all wonky:
data {
int<lower=1> N; // rows of data
int<lower=0> value[N];
int<lower=0> count[N];
}
parameters {
real<lower=0, upper=1> theta;
real<lower=0> lambda;
}
model {
theta ~ beta(1, 1);
lambda ~ gamma(1, 1);
for (n in 1:N) {
int C;
C = count[N];
for (idx in 1:C) {
if (value[n] == 0) {
target += log_sum_exp(bernoulli_lpmf(1 | theta),
bernoulli_lpmf(0 | theta) + poisson_lpmf(value[n] | lambda));
} else {
target += bernoulli_lpmf(0 | theta) + poisson_lpmf(value[n] | lambda);
}
}
}
}
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
theta 0.11 0.00 0.06 0.02 0.07 0.10 0.15 0.26 2917 1
lambda 5.75 0.01 0.51 4.80 5.39 5.74 6.09 6.76 2705 1
Sorry if I’m missing something completely obvious, but my suspicion was that these models would produce roughly the same results.