Truncated normal not mixing sometimes



Ok, at this stage the “performance” you are looking for is something that will get around the distribution and give you a chance to understand mixing problems. If stepsize crashes then you have a sampler that can’t even do that… so lowering adapt_delta is the right step here.

So what are the performance problems you are getting?


It was mixing in general. I will re-run it and post what it looks like with adapt_delta = 0.6. I knew I should have saved it - sorry for not doing so!

As for changing distributions, also keen to do this given that the normal without truncation works so well.


So this is what I mean by “performance” with adapt_delta =0.6:

And here is the acceptance:


So stepsize still crashes.


Not as epically as before:

In better news, it occurred to me that some of the groups (i.e. sites) might have distinct distributions. I previously explored the balance by month and day but not sites. Turns out there are small extreme values accounting for 2, 4, and 10% of the values in these three sites, respectively. Dropping these three sites, the model fits quite nicely in 200 iterations. So, as I suspected, something to do with the raw data. I’m still not sure why the truncated normal couldn’t handle all this, especially as in total these observations aren’t overly common, but posterior geometries are well beyond my understanding!


I should add that the other sites do have small extreme values but they’re much less common.


Sigh, it’s not the truncated normal that’s the problem.


A bit of both I’d say… Why shouldn’t it work if there is a lot of mass near the lower bound? The 3 sites I mentioned weren’t showing great signs of bimodality, the problems were just a few hundred observations in a dataset of 87k, and it all worked fine without truncation. I’m probably just too ignorant on the underlying maths to see why this is obvious.


The computational problem is that lower bounds in the constrained space map to negative infinity in the unconstrained space. So they tend to require large step sizes, which can be unstable w.r.t. overflow.

The statistical problem is that people think that if the true value of a parameter theta is 1.2, then imposing a prior like uniform(1, 2) won’t affect the posterior mean. That’s true for maximum likelihood, but not for Bayes. What happens is that any mass that would’ve been below 1 gets pushed above 1 and the estimate will be higher than if you’d had a uniform(-10, 10) prior. So adding a lower bound that truncates some mass pushes that mass above the lower bound thus increasing the posterior mean (usually unintentionally).


I just scheduled a blog post on Andrew’s blog for tomorrow that provides an example (where tomorrow is 28 November 2017, 13:00 EST).