Hey all,
I’ve been doing the Cook, Gelman, and Rubin software validation method to jointly validate my data generating and fitting models and I noticed that in many cases I was seeing a spike at 0 on the histogram of sum(posterior_thetas <= theta0)/length(posterior_thetas)
which should look like a uniform(0, 1) distribution. I discovered some bugs in my models, but eventually realized that to make that spike go away I had to do two separate things:
- replace
<=
with >
in the above sum
- In places where I was generating a positive value (typically for scale parameters), it was not sufficient to do e.g.
fabs(normal_rng(0, 5))
, instead I had to do a while loop and iterate until drawing a value greater than or equal to zero.
#1 above was implemented in R, so I suspect some kind of weird R-specific rounding or coercion but would love to know what is happening. #2 above was implemented in Stan and when I attempted to test the distributions of random numbers drawn with both techniques they seemed to line up within 2 decimal places for quantiles and to have extremely similar histograms at several levels of granularity.
Any ideas?
cc @betanalpha and @jonah who have been helping me with this so far.
There isn’t enough info here to say something concrete (what’s the model etc)
But I can tell you that fabs(normal_rng(0,1)) has a very different distribution to
sample = -1;while (sample <0) { sample = normal_rng(0,1)}
The first is a "folded normal" while the second is a truncated normal (ie a normal conditioned on the event that every sample is positive).
So which one do you actually want to use?
Actually I’m wrong. In the special case where it’s a N(0,1), the two coincide. Nevertheless, which one do you want?
Right, for a zero mean they are exactly the same for Stan as the different normalizing constants don’t effect sampling. But to be pedantic we want a truncated normal here.
If you compare fabs(normal_rng(0, 1))
to a while loop that checks while (x <= 0)
then Cook et all comes out fine. Which means the only difference is at x = 0, which should be vanishingly small for double precision floating point. It’s almost as if there’s a loss of precision somewhere that causes a nontrivial excess near zero.
There’s a similar problem with the comparison in R – the cases where <
and <=
are different should be extremely rare in double precision floating point, yet we’re seeing 1% effects in the Cook et al histograms.
is the code anywhere I can see?
Does this happen when you only use the parameter on the internal scale
(which I guess is a log transform?). That is, don’t use the constrained
"natural scale" parameter just the inconstrained one.
Also. Have you checked the values at which < and <= are different. It might not be R.
Hey Dan! Thanks for taking a look. I put up all the code here for now.
The above weirdness happens for a few different models; you can see the entry point in R for a simple 1D linear regression here, which ends up using this to generate data and this to fit it. The offending <=
(now >
) is here.
I’m kind of new - how would one go about using the parameter on the unconstrained scale? Isn’t all the math in a Stan program implicitly done on the constrained scale?
Let me try to see the set difference between <= and = and get back to you. Thanks again.
Actually it was <= vs >; I suspect that since <= was causing a spike at 0th percentile (for theta0 among new thetas sampled from the posterior) that changing it to < would not make a big difference - it was already reporting that none of them were <=, so also none of them would be <, right?
I foolishly ran your code and it may never finish…
So I’m just going to ask more questions:
-
Does this only happen for constrained parameters? (If so, it indicates that the problem is on Stan’s end, not on R’s)
-
Can you hack together a problem with an analytic form for the posterior (probably N(mu, sigma) with easy priors) and do the test for exact draws from the posterior in R? Does that show the problem?
Sorry, now I’ve commented out all the irrelevant stuff. The linear regression thing should finish in a few minutes by itself.
I’m not sure how to do either of those two other things… maybe Michael can point me in the right direction today.
The models for the normal are here. With conjugate priors and the associated posterior
It’s probably enough to do both unknown mean, known variances (unconstrained parameter) and unknown variance, known mean (constrained parameter).
The end of the doc shows how to do it jointly.
Okay, so on your and Michael’s suggestion I made a new linear regression model with a known variance (gen model, fit model) and generated the attached PDF,
constant_sigma_fits.pdf (36.5 KB),
showing 3 pages of increasingly granular histograms for the parameters using <=
(“lte”) and >
(gt). Then another 3 pages after with a new random seed, to show that some artifacts around 0 stick around. I’m really not sure what to read into this - it doesn’t actually look like switching to >
helped the problem, nor that the while
vs fabs
thing was the only issue.
Any thoughts based on this? I can go and do the analytic posterior vs. stan fit posterior next if that might shed additional light.
These things are hard to do by eye. What’d the statistical tests say?
Are you checking that you have convergence and reasonable n_eff in all these runs?
As we talked about in the meeting, the statistical tests don’t really examine the fact that the spikes are often at 0, but they’d probably pass in these cases.
I made two more pdfs of charts that I think might show that the fabs
issue was likely a red herring - there’s a linear regression with while loop (lin_regr, model) and with fabs (lin_regr_fabs, model)
lin_regr_fabs.pdf (23.7 KB)
lin_regr.pdf (24.0 KB)
(<=
always on the left, >
always on the left in these)
You can see they aren’t the same, especially on the most granular histograms, but they might be close enough? I don’t have a good intuition for this.
It also seems like the issue exists roughly equally in both >
and <=
quantiles.
The spikes aren’t that weird by themselves, but the fact that one shows up at 0 in like 7 out of 8 histograms of appropriate granularity across seeds and various ways of getting sigma seems weird to me…
Those models should be the same because you’re using the while loop with <=
. The discrepancy arose when you compared the fabs
implementation against the while loop with just <
.
fabs
should be even more the same as < 0
right? Either way, I re-ran and here’s the new one:
lin_regr_lt.pdf (23.9 KB)
Use INLA::inla.ks.plot( data, punif) where data is whatever you’d draw the histogram of.
This will give a plot where the p-value from the KS test is displayed as well as the empirical process* that the test is based on. This should not have long sojourns outside the circle. It gives more information about where the problem is, but is considerably easier to read than a histogram.
*(It’s a plot of |ecdf - punif|/sqrt(n), which converges to a Brownian Bridge as n-> infintiy if the uniform distribution is correct (here ecdf is the empirical CDF).
1 Like
No, fabs
would admit zero so is equivalent to the while loops with <= 0
. I thought that you had done some experiments verifying this, but those results may have been confounded with other stuff.
As you can see from these pdfs, the effect is not always there on every graph; I think that was confusing me as I was only doing a small subset of these graphs when testing this manually before. Your advice to do this more exhaustively was 👌🏻. Sorry for the confusion.
(Also I don’t think this matters, but fabs
and while (sigma < 0) sigma = normal_rng(0, 5);
admit 0s, while (sigma <= 0)
will retry on 0.)