Stan 2.10 through 2.13 have broken samplers

As far as we can tell, Stan 2.09 is the latest version of Stan
with a properly functioning sampler.

Versions from 2.10 on are producing biased samples
that slightly underestimate posterior variance. Thanks to
Matthew R. Becker for filing the issue:

https://github.com/stan-dev/stan/issues/2178

Stan 2.10 changed the NUTS algorithm from using slice sampling
along a Hamiltonian trajectory to a new algorithm that uses
multinomial sampling:

https://arxiv.org/abs/1601.00225

We are mortified that after all of our nagging to get people
to use samplers that worked and weren’t biased, we released
a biased sampler. The 2.10 version had a major bug which was
easy to see and fix, but that apparently didn’t solve the
bigger problem.

Michael and I are poring over the proofs and the code, but
it’s unfortunate timing with the holidays here as everyone’s
traveling. We’ll announce a fix and make a new release as soon
as we can. Let’s just say this is our only priority at the moment.

Until then, the only thing I can recommend is using straight
up static HMC (which is not broken in the Stan releases)
or using something other than Stan or rolling back to Stan 2.09.

I’m not even sure how to do the latter for versions other than CmdStan,
which is just a source download and doesn’t require any
installation.

If all else fails, we’ll roll back the sampler to the 2.09 version
in a couple days.

  • Bob

Let me temper the panic by saying that the bias is relatively small and affects only variances but not means, which is why is snuck through all our testing and application analyses. Ultimately posterior intervals are smaller than they should be, but not so much that the inferences are misleading and the shrinkage will be noticeable only if you have more than thousands of effective samples, which is much more that we typically recommend.

Static HMC seems to be giving valid results on the simple test problems that we are considering, but it still performs horribly on hard problems and so I would advise again using it seriously.

I updated the blog post with Michael’s comment, which pretty much
matches what Andrew said.

I’m still mortified, not because bugs get through, but because this
is one we should be able to catch. On the plus side, we now have the
model to catch this in future regression tests (what computer scientists
call tests that make sure working behavior doesn’t “regress” to
a previous buggy behavior).

  • Bob

From rstan you can set algorithm=“HMC” when calling stan(), but I would
trust Michael on this. That is, even with the bug NUTS should be better
than static HMC (except for some trivial cases).

Correct.

Stan 2.14 is out now.

@betanalpha, I saw the unit tests. I don’t remember if there was a test added to help us catch this sort of bug in the future. If we didn’t add one with the pull request, could we add one now?

Look at the diff – two tests were added to catch the particular bugs addressed in the PR.

I remember those tests. I was thinking something end-to-end. We know what’s
correct in analytic models. We should know if we introduce something that’s
not correct.