Pseudo-extended MCMC

Thanks. Here’s the Stan code:

transformed data {
  real r = 10;
  real A = 6;
  real omega = 6;
}
parameters {
  vector[2] x;
}
model {
  target += -0.5 * (hypot(x[1], x[2]) + r + A * cos(omega * atan2(x[2], x[1])));
}

This is probably another good case for convergence diagnoistics. Here’s the RStan call:

fit <- sampling(model,
                control = list(stepsize = 0.01, adapt_delta = 0.999, max_treedepth=11), 
                chains = 4, iter = 50000)

There was still one iteration that exceed max treedepth 11 and one divergent iteration among the 100K iterations.

     mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
x[1]  0.15    0.15 3.35 -7.06 -1.32  0.03  1.79  7.22   499 1.01
x[2] -0.09    0.15 3.42 -7.15 -2.10 -0.17  1.95  6.78   518 1.01
lp__ -4.53    0.02 1.60 -8.54 -5.35 -4.20 -3.35 -2.41  7576 1.00

Nevetheless, it looks like the true posterior means (which should be zero) are within standard error, but that n_eff is terrible. This is 100K draws! We start to get nervous when the effective sample size rate is 200 draws. Stan’s definitely not exploring this posterior well, though I get the feeling it’d get there with more draws and ideally more chains.

Because of the low n_eff, the standard error for the means is pretty bad. I’m not sure what the true posterior sd is supposed to be. Although it appears to be roughly symmetric for x[1] and x[2], it may not be getting far enough into the tails.

Here’s the code for the plot:

> post_x <- extract(fit)$x
> post_x1 <- post_x[ , 1]
> post_x2 <- post_x[ , 2]
> df <- data.frame(x1 = post_x1, x2 = post_x2)
> plot(post_x1, post_x2)
> library(ggplot2)
> ggplot(df, aes(x1, x2)) + geom_scatterplot()

Which is why we odn’t evaluate these things by eye:

So I tried 16 chains with some thinning so as not to kill my machine:

fit <- sampling(model, chains=16, iter=50000, thin = 25,
                control=list(adapt_delta=0.999,
                             stepsize=0.999, 
                             max_treedepth=11))

It takes about 20s/chain on my Mac notebook (6-year old i7). This produces much better results, as one might expect, with a total of 16K saved posterior draws.

> print(fit)
Inference for Stan model: flower.
16 chains, each with iter=50000; warmup=25000; thin=25; 
post-warmup draws per chain=1000, total post-warmup draws=16000.

      mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
x[1] -0.04    0.07 3.45 -7.37 -1.71 -0.02  1.61  7.35  2322 1.01
x[2] -0.05    0.07 3.44 -7.04 -2.06 -0.19  2.00  6.79  2271 1.01
lp__ -4.56    0.01 1.62 -8.60 -5.39 -4.21 -3.37 -2.44 15039 1.00

The sd are even a little higher. As you can see, 2.5% and 97.5% intervals are terrible. This may be indicating poor tail exploration, but they’re still within MCMC standard error here because 2.5% tail probabiliites only use about 1/40 of the draws, meaning 1/sqrt(40) the effective sample size, which is still decent here at around 300.

3 Likes