Hi everyone!
I read from the user guide that Stan does not support the zero-inflated model for continuous distribution for now. I am wondering if there is a way out or around? For example, a simplified problem may look like:
y_i=0 with probability \theta and y_i\sim \text{lognormal}(\mu_i, \sigma_i^2) with probability 1-\theta, \mu_i and \sigma_i follows some prior distributions, i=1,\cdots,n.
I followed max’s response in another thread and wrote the following stan model:
model2='
data {
int<lower=0> N;
int<lower=0> y[N];
}
parameters {
real<lower=0, upper=1> theta;
real mu;
real<lower=0> sigma;
}
model {
mu ~ normal(0, 10);
sigma ~ lognormal(0, 2);
for (n in 1:N) {
if (y[n] == 0)
target += bernoulli_lpmf(0 | theta);
else
target += bernoulli_lpmf(1 | theta) + lognormal_lpdf(y[n] | mu,sigma);
}
}
'
I generate some data points and test the model above:
set.seed(1)
z = rbinom(200,1,0.8)
generate_lognormal = function(z) ifelse(z==0, 0, exp(rnorm(1,0,1)))
y = sapply(z, generate_lognormal)
dat2 = list(N=200, y=y)
fit2 <- stan(model_code=model2, data=dat2, chains=2, warmup=500, iter=1000, cores=1, refresh=0)
But I got an error:
Error in mod$fit_ptr() :
Exception: int variable contained non-int values; processing stage=data initialization; variable name=y; base type=int (in ‘model10c6d393b3f99_728a875fb87c0036fa51d56fb3a6ee03’ at line 4)
failed to create the sampler; sampling not done
It seems that y must be an integer… Is this a requirement for the hurdle model? Does anyone encounter similar problems? Thanks for your help!