I am fitting a rather complicated model but managed to narrow the issue to something that is easy to demonstrate in a small, self-contained example.
Consider a mixture of two normals in \mathbb{R}^3: one at the origin, with variances 1, one at [1,1,1], with a narrow variance. Here is the Stan code:
data {
int<lower=1> K; // dimension
real<lower=0, upper=1> alpha; // mixture weight
vector[K] mu1;
vector[K] mu2;
matrix[K,K] Sigma1;
matrix[K,K] Sigma2;
}
parameters {
vector[3] y;
}
model {
target += log_sum_exp(log(alpha) + multi_normal_lpdf(y | mu1, Sigma1),
log(1-alpha) + multi_normal_lpdf(y | mu2, Sigma2));
}
Here is the parametrization in R:
library("rstan")
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
Sigma1 <- matrix(c(1, 0, 0, 0, 1, 0, 0, 0, 1), 3)
Sigma2 <- matrix(c(0.22251677640490497, -0.10171944195482832, 0.047549588109993136,
-0.10171944195482832, 0.16903679793887302, -0.06377848041623378,
0.047549588109993136, -0.06377848041623378, 0.08844642565622202), 3)
Sigma2_scale <- 1/16
data <- list(K = 3, alpha = 0.3,
mu1 = c(0, 0, 0), Sigma1 = Sigma1,
mu2 = c(1, 1, 1), Sigma2 = Sigma2 * Sigma2_scale)
fit <- stan(file = "normal-mixture.stan", data = data)
get_adaptation_info(fit)
If you make Sigma2_scale <- 1
, it fits OK, but with the value above, I get
1: There were 8 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems
3: The largest R-hat is 1.29, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
I imagine that the issue is very different curvature at various parts of the posterior.
I fixed the original model by allowing more noise (I think it was misspecified), but I am curious if there is anything I can do to improve the MCMC performance of this model in Stan as it is.