Sampler avoiding zero in hierarchical prior?

I’ve been trying to calibrate a complex model, and have broken it down to a piece that I’m not sure I understand. This may be a trivial misunderstanding - I’m certainly new to hierarchical models - but it seems like a strange sampler behavior and I wanted to ask about it here.

I’m comparing the results from using hmc-nuts to sample directly from a (hierarchical) prior to using normal_rng to generate samples. What I’ve found is that for the scale parameter of the normal distribution, the sampled prior doesn’t agree with the generated one - and for some reason the sampled prior avoids zero.

Here’s the minimal working example:

parameters {
  real<lower=0> sigma_bar;
  real<lower=0> sigma_sig;
  real<lower=0> sigma;  

model {
  sigma_bar ~ normal(0.5,0.5);
  sigma_sig ~ normal(0.5,0.5);
  sigma ~ normal(sigma_bar,sigma_sig); 

generated quantities {
 real<lower=0> s_bar;
 real<lower=0> s_sig;
 real<lower=0> s;
s_bar = -1;
s_sig = -1;
s = -1;
while(s_bar <0)
 s_bar = normal_rng(0.5,0.5);
while(s_sig < 0)
 s_sig = normal_rng(0.5,0.5);
while(s < 0)
 s = normal_rng(s_bar,s_sig);

I’d appreciate any help anyone can give on this - is this sampler behavior expected? If this is a problem, is there a way to avoid it? Am I correct in assuming that the two methods should give the same distribution?

(I’m running 4 chains of 1e7 samples to get this.)

I don’t see anything with your model that would cause this and you are right that this is disconcerting. Could you open an issue here:

If you would like to contribute (we love it when people contribute) it would be helpful to see this compared to what you get using an RNG from R or Python and the density calculation from another source (e.g. dnorm).

Since the amount of mass truncated away is not a constant when the mean and / or standard deviation of a normal distribution is not fixed, you have to explicitly truncate with

  sigma ~ normal(sigma_bar,sigma_sig) T[0,]; 

or adjust target yourself with normal_lccdf(0 | sigma_bar, sigma_sig). When you do that, the posterior distribution of the parameters is conceptually the same as the things you generate in the generated quantities block, but you get a bunch of intractable divergences.

@bgoodri - thanks, this has significantly improved the situation. When I truncate all the variables with T[0,]; the result is significantly less biased. Is this likely to be an issue with all hierarchical models with truncated priors? I certainly never saw this explicitly done in an example.

I don’t actually seem to be getting any divergences (or at least not any warnings about them…). But there is still an issue with the region near zero.

The reason this came up was that I was trying to perform the Cook-Gelman-Rubin 2006 calibration on a hierarchical model with truncated priors, like this one. What I found was that I often got a bias where the “true” value was significantly below the confidence intervals - my generated values could be near zero, but the inferred values almost never were.

@sakrejda - given the truncation improvement, think it is still worth opening an issue?

Yes please, it’s definitely worth looking in to and you have a good example. The sampler should be doing better than this, it’s not like you’re looking too close to 0 here.

Glad Ben had an answer—I missed it exactly because we recommend these kinds of priors (with fixed parameters). I think this issue is mentioned in our discussion of these priors but only indirectly (i.e.-we say it’s ok to do this without truncation b/c the parameters are fixed but we could be clearer).

@sakrejda - added it!

It also seems that once you have some priors that are not fixed, you need to truncate everything - even with fixed values!

I was pretty confused by this, and I’ll go through the handbook and see if there are some examples that I based my code on where these truncations are not put into place explicitly. It seems like a very natural thing to have happen whenever you try to fit the hierarchical variance.

Thanks for filing the issue, I’ll look into it.

I agree that the other issue, figuring out when you need explicit truncation, is confusing and we should find a way to make it clearer for users. The only reason you don’t need explicit truncation is if the parameters are fixed s.t. the truncation only introduces a constant… if you can find misleading examples in the manual you could also add those to the running issue we have for the Stan manual.

And even if you do not need explicit truncation to get the samples, if you are using bridge sampling to estimate something like the marginal likelihood afterward, then you need to have done the truncation explicitly to get that right. Of course, in that scenario you have to be doing target += rather than ~ anyway.