I’m receiving a frequency of integer values and I would like to estimate a zero-inflated negative binomial model from the data.
In python, the frequency of integer values looks something like that:
{
0: 100, # 0 was observed 100 times
1: 50, # 1 was observed 50 times
2: 25, # and so on and so forth
3: 12,
4: 6,
5: 3,
6: 1,
100: 1,
}
# Eventually, this gets passed into stan through pystan as such:
# data = {
# 'N': 8,
# 'value': [0, 1, 2, 3, 4, 5, 6, 100],
# 'count': [100, 50, 25, 12, 6, 3, 1, 1],
# }
My stan code looks as such. Note that I am okay with this parametrization and these priors because I’m usually pretty sure that the data are overdispersed, so it’s ok to put more weight on the overdispersion.
data {
int<lower=1> N;
int<lower=0> value[N];
int<lower=0> count[N];
}
parameters {
# the generative process is as follows:
# flip a coin, with probability of heads = theta
# if this coin is heads, X ~ NegBinom(mu0, phi)
# if this coin is tails, X = 0.
real<lower=0, upper=1> theta;
real<lower=0> mu0;
real<lower=0> phi;
}
model {
# priors
theta ~ beta(0.75, 0.75);
mu0 ~ normal(0, 5); # I have tried different combinations here, including half cauchy too,
phi ~ normal(0, 5); # and also for different variances
// UPDATE
for (n in 1:N) {
if (value[n] == 0) {
target += count[n] *
log_sum_exp(
bernoulli_lpmf(0 | theta),
bernoulli_lpmf(1 | theta) + neg_binomial_2_lpmf(value[n] | mu0, phi)
);
} else {
target += count[n] *
(bernoulli_lpmf(1 | theta) + neg_binomial_2_lpmf(value[n] | mu0, phi));
}
}
}
generated quantities {
real mu;
mu = theta * mu0;
}
My worry is that I’m getting a few warnings of the following kind.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: neg_binomial_2_lpmf: Location parameter is inf, but must be finite! (in 'unknown file name' at line 32)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
These warnings have persisted no matter what kind of priors that I use: uniform, normal, Cauchy, etc. After reading around, I’ve gathered that I shouldn’t worry, for the following reasons:
- the warnings only occur during the warmup phase
- the nhats and reffs look reasonable
Can anyone comment on my conclusions, or does anyone have a better suggestion on what priors I might use in this case to avoid such warnings?