Poisson-LogSkewNormal mixture + partial pooling

I have the following two-component mixture model. It is quite similar to zero-inflated model: one component is for a cluster of small values (but not necessarily zero), and the other is for relatively larger values with extremely large variance.

functions {
    real continuous_poisson_log_lpdf(real n, real alpha) {
        return n*alpha - exp(alpha) - lgamma(n+1);
    }
}
data {
    int<lower=1> N;
    real<lower=0> n[N];
    real<lower=0> mu_init;
    vector<lower=0>[N] C; // offsets
}
transformed data {
    vector[N] logC = log(C); 
}
parameters {
    simplex[2] xi;
    real<lower=0,upper=1> lambda_zero;
    vector<lower=0>[N] lambda;
    real<lower=0> mu;
    real<lower=0> sigma;
    real alpha;
}
model {
    vector[2] log_xi = log(xi);
    mu ~ normal(mu_init, 1);
    sigma ~ normal(0, 1);
    alpha ~ normal(0, 1);
    lambda ~ skew_normal(mu, sigma, alpha);
    for (i in 1:N) {
        vector[2] lps = log_xi;
        lps[1] = lps[1] + continuous_poisson_log_lpdf(n[i]|lambda_zero+logC[i]);
        lps[2] = lps[2] + continuous_poisson_log_lpdf(n[i]|lambda[i]+logC[i]);
        target += log_sum_exp(lps);
    }
}

Here’s the output.

Inference for Stan model: anon_model_b62cdae42a3f6a2a1b4da6046463c0d1.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

              mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
xi[0]         0.04  1.5e-3   0.01   0.02   0.03   0.04   0.05   0.07   75.0   1.06
xi[1]         0.96  1.5e-3   0.01   0.93   0.95   0.96   0.97   0.98   75.0   1.06
lambda_zero   0.77    0.09    0.3    0.1   0.54    1.0    1.0    1.0   11.0   1.55
lambda[0]     7.39  5.1e-4   0.02   7.36   7.38   7.39    7.4   7.42  890.0    1.0
lambda[1]      7.5  4.6e-4   0.01   7.47   7.49    7.5   7.51   7.53  964.0   1.01
...
lambda[285]   7.52  1.0e-3   0.02   7.48   7.51   7.52   7.54   7.56  488.0    1.0
mu            8.14  8.5e-3   0.07   8.01   8.09   8.14   8.18   8.29   72.0   1.06
sigma         1.78    0.04   0.17   1.56   1.67   1.75   1.85   2.25   21.0   1.21
alpha        -3.98    0.07   0.52  -5.07  -4.32  -3.95  -3.63  -3.02   56.0   1.07
lp__        -1.5e4  4795.7  2.2e4 -9.7e4 -2.0e4 -1.2e4  -1013 -990.7   21.0    1.2

Samples were drawn using NUTS at Thu Oct 12 16:53:22 2017.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

With ~300 data points, it took about 20 miniute on my AMD Operon 6136 (512MB RAM) machine. Apparently n_eff and Rhat do not look great. Here are my questions.

(1) What do you recommend to resolve n_eff or Rhat issue in general? I have tried different distributions, e.g., Poisson-LogNormal or Poisson-LogGamma. I have also tried different parametrization on those distributions. But neither helped.

(2) Can lambda ~ skew_normal(mu, sigma, alpha); be there outside of the for-loop even though lambda’s are only used in the second component of the mixture?

(3) Does 20 minute sound reasonable for this mixture to run 2,000 iterations? What would possibly speed up this model?

(4) What is your opinion about generalizing poisson distribution onto continuous domain like I am doing here?

Any comment would be greatly appreciated.

I didn’t even know Stan could run in 512MB RAM. How do you compile your models? Does it take forever with swapping?

Your model is fitting a Poisson parameter for each item.

can be simplified to

target += log_mix(xi, 
    continuous_poisson_log_lpdf(n[i] | lambda_zero + logC[i]),
    continuous_poisson_log_lpdf(n[i] | lambda[i] + logC[i]));

The problem you’re having with R-hat is most likely due to the centered parameterization of the skew-normal. I don’t know how to reparameterize that, but you want to look at doing the same thing as we normally do fro non-centered parameterizations.

Oops, didn’t answer whole thing. Yes, this should be outside for loop, but it should be translated to a non-centered parameterization. You might want to try fitting with a regular normal first and then generalize to skew normal.

Getting the right parameterization helps immensely because then the step size can be reasonable.

No opinion, but why not use a continuous distribution to begin with?

Urgh…I meant 512GB. Thank you for your comments.