I have the following two-component mixture model. It is quite similar to zero-inflated model: one component is for a cluster of small values (but not necessarily zero), and the other is for relatively larger values with extremely large variance.
functions {
real continuous_poisson_log_lpdf(real n, real alpha) {
return n*alpha - exp(alpha) - lgamma(n+1);
}
}
data {
int<lower=1> N;
real<lower=0> n[N];
real<lower=0> mu_init;
vector<lower=0>[N] C; // offsets
}
transformed data {
vector[N] logC = log(C);
}
parameters {
simplex[2] xi;
real<lower=0,upper=1> lambda_zero;
vector<lower=0>[N] lambda;
real<lower=0> mu;
real<lower=0> sigma;
real alpha;
}
model {
vector[2] log_xi = log(xi);
mu ~ normal(mu_init, 1);
sigma ~ normal(0, 1);
alpha ~ normal(0, 1);
lambda ~ skew_normal(mu, sigma, alpha);
for (i in 1:N) {
vector[2] lps = log_xi;
lps[1] = lps[1] + continuous_poisson_log_lpdf(n[i]|lambda_zero+logC[i]);
lps[2] = lps[2] + continuous_poisson_log_lpdf(n[i]|lambda[i]+logC[i]);
target += log_sum_exp(lps);
}
}
Here’s the output.
Inference for Stan model: anon_model_b62cdae42a3f6a2a1b4da6046463c0d1.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
xi[0] 0.04 1.5e-3 0.01 0.02 0.03 0.04 0.05 0.07 75.0 1.06
xi[1] 0.96 1.5e-3 0.01 0.93 0.95 0.96 0.97 0.98 75.0 1.06
lambda_zero 0.77 0.09 0.3 0.1 0.54 1.0 1.0 1.0 11.0 1.55
lambda[0] 7.39 5.1e-4 0.02 7.36 7.38 7.39 7.4 7.42 890.0 1.0
lambda[1] 7.5 4.6e-4 0.01 7.47 7.49 7.5 7.51 7.53 964.0 1.01
...
lambda[285] 7.52 1.0e-3 0.02 7.48 7.51 7.52 7.54 7.56 488.0 1.0
mu 8.14 8.5e-3 0.07 8.01 8.09 8.14 8.18 8.29 72.0 1.06
sigma 1.78 0.04 0.17 1.56 1.67 1.75 1.85 2.25 21.0 1.21
alpha -3.98 0.07 0.52 -5.07 -4.32 -3.95 -3.63 -3.02 56.0 1.07
lp__ -1.5e4 4795.7 2.2e4 -9.7e4 -2.0e4 -1.2e4 -1013 -990.7 21.0 1.2
Samples were drawn using NUTS at Thu Oct 12 16:53:22 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
With ~300 data points, it took about 20 miniute on my AMD Operon 6136 (512MB RAM) machine. Apparently n_eff and Rhat do not look great. Here are my questions.
(1) What do you recommend to resolve n_eff or Rhat issue in general? I have tried different distributions, e.g., Poisson-LogNormal or Poisson-LogGamma. I have also tried different parametrization on those distributions. But neither helped.
(2) Can lambda ~ skew_normal(mu, sigma, alpha);
be there outside of the for-loop even though lambda’s are only used in the second component of the mixture?
(3) Does 20 minute sound reasonable for this mixture to run 2,000 iterations? What would possibly speed up this model?
(4) What is your opinion about generalizing poisson distribution onto continuous domain like I am doing here?
Any comment would be greatly appreciated.