Cook et al spike at 0

Ah, yes, sorry. I was thinking of inclusion with the <= not exclusion as needs for the while loop.

So I ran that on the linear regression cook et al quantiles: inla_lin_regr.pdf (2.5 MB)

I see some pretty decent extracircular sojourns but I’ve also never seen this kind of graph before.

But none of the p-values are scary. I think this may be related to what Michael was saying about these generic tests not being that sensitive.

I asked @mitzimorris and turns out she’d run 1000 replications and it looked fine with broad bin plots, but then with narrow bins, there was the same spike at zero. Hopefully she’ll reply later with more info.

1 Like

For the record, here’s the 8 schools centered parameterization plots for the first 4 params individually and then all of them together. You can see some extremely obvious problems here and the p-value for tau are < 0.01 (0.0024). So maybe the guidance for using this method can look something more like that? Though I’m still very curious what’s causing these little spikes at 0…

8schools_cp.pdf (26.7 KB)
inla_8schools_cp.pdf (794.5 KB)

(For further apples-to-apples comparison, the very reasonable looking non-centered parameterization:
8schools_ncp.pdf (26.9 KB)
inla_8schools_ncp.pdf (797.6 KB)
)

1 Like

Are you sure those spikes correspond to runs that converged? Otherwise, I’d try to plot the parameters resulting in zeros in one color and the ones resulting in non-zero values in another and see if there’s a pattern.

Sorry, the reason I posted that was to show an example of a model with an obvious problem. In the centered parameterization the data generating model and model line up perfectly but we’re getting divergences that you can see if you inspect the fits. I’ll make another example with a slightly-off prior instead…

Somehow, having a slightly off prior (normal(0, 4) for generating, normal(0, 5) for fitting) leads to much more extreme results under the KS test plots Dan posted (which I’m starting to really like), though less extreme looking in the normal histograms than the tau plots in the divergent centered 8 schools quantiles.
8schools_ncp_off.pdf (26.6 KB)
inla_8schools_ncp_off.pdf (732.5 KB)

The easiest way to fix this is to truncate (so do the KS test on [0,0.2]. Pretty sure the derivation of the test statistic still holds in this case.

Or work out a more targeted measure of difference between the theoretical and empircal CDF and use a Monte-Carlo hypothesis test.

I tried that using this code:
INLA::inla.ks.plot(quants_lte[quants_lte <= 0.2], function(q) {return(punif(q, max=0.2))})
which made:
2
and
INLA::inla.ks.plot(quants_lte[quants_lte >= 0.2], function(q) {return(punif(q, min=0.2))})
with:
2

I also have been getting this warning message:

Warning message:
In ks.test(x, y, ...) :
  ties should not be present for the Kolmogorov-Smirnov test

Okay, so at least the eight schools NCP is working well qualitatively!

True, it doesn’t have any spikes at 0 either…

Can you summarise form me? When does this spike at zero happen:

  • Bad parameterisations?

  • good models where all the other diagnostics work?

  • from exact posterior samples? (Is without Stan. Preferably tested without a stan call)

  • Are there ties in the samples? (This really shouldn’t Happen, I don’t think)

  • If there are ties, are they more likely to be near zero?

Good call for structure. I have seen spikes at 0 (with <=, i.e. all sampled posterior thetas are greater than the theta0 that generated them) for:

  • Bad parameterisations (8 schools cp w/ divergences) with pretty extreme p-values with the KS test
  • Good models with good diagnostics and data generating models
    • all the linear regression models
    • 8 schools ncp has some heaviness at 0
    • These “pass” KS tests, unless you zoom in as Dan suggested above
  • Bad prior specified in data generating model - histogram generally looks off and fails KS test

Ties
I think the reason there are ties is because of the way the quantiles are computed. First, some number of theta0s are drawn from the data generating model (I’ve been doing 1000 or 10000). Then 500 samples are drawn from the posterior fit model, then the quantile of the theta0 is calculated with a sum(posterior_thetas <= theta0)/length(posterior_thetas). Since the denominator is always 500 and we do this 1000 or 10000 times, we must get repeated quantiles for any distribution.

Need to test:

  • exact posterior samples sans stan

Ok. I think I’m running out of ideas as to why this should happen.

Here’s what I know

  • I know it doesn’t happen with exact draws from the posterior
  • (Except occasionally due to randomness. the spike is not systemic)
  • I know that as you reduce the effective sample size, the KS gets worse (not surprising as MC error dominates when computing the quantile)
  • I know that the interpretation is that “more times than you’d expect, the posterior is biased upwards”. This would be expected if you had sticking, so have you checked for that?
  • If it happens systemically on “constrained parameters” it could be a numerical problem with the transformation/jacobian (or, I guess, a bug). Does anyone know how to check that? (ie compare the histogram that had no transformations to the one that does? Maybe by “hand coding” the transformation and checking you get the same thing for the automatic and the hand-rolled code?

Maybe one more thing for Sean to do is to set this running overnight and build pdfs (or directories) of realisations of these histograms for different data sets. Maybe 100 for each parameter. Then hand check each one to see if there’s a spike (to work out how frequently it happens).

Anyway: if anyone else has any ideas I’m all ears.

Code for Cook et al test with exact samples for a conjugate normal model with prior of the form

mean | prec ~ N( prior_mean, 1/sqrt(prec) )
prec ~ Gamma( prior_shape, prior_rate )

NB: These aren’t independent priors!

N_reps= 10000;
N_samp = 1000;

prior_shape = 1;
prior_rate = 1e-3;
n_data=100;
prior_mean=0;
prior_prec=1;

do_posterior_prop_unknown_mean_unknown_prec = function(n_data, N_samp, true_mean , true_prec, prior_mean, prior_prec,prior_shape,prior_rate) {
data = rnorm(n_data,mean = true_mean, sd = 1/sqrt(true_prec))
x_bar = mean(data)
ss_x = sum((data - x_bar)^2)

post_mean = (prior_precprior_mean + n_datax_bar)/(prior_prec+n_data)
post_prec = prior_prec + n_data
post_shape = prior_shape + n_data/2
post_rate = prior_rate +ss_x/2 +0.5n_dataprior_prec/(n_data + prior_prec)*(x_bar - prior_mean)^2

prec_samps = rgamma(N_samp,shape=post_shape,rate=post_rate)
mean_samps = rnorm(N_samp,mean=post_mean, sd = 1/sqrt(post_prec*prec_samps))

prec_quant = mean(prec_samps <= true_prec)
mean_quant = mean(mean_samps <= true_mean)

return( list(mean= mean_quant, prec=prec_quant))

}

output_mean = rep(NA,N_reps)
output_prec = rep(NA,N_reps)
for(i in 1:N_reps) {
true_prec = rgamma(1,shape=prior_shape,rate=prior_rate)
true_mean = rnorm(1, mean=prior_mean, sd = 1/sqrt(true_prec))
tmp = do_posterior_prop_unknown_mean_unknown_prec(n_data, N_samp, true_mean , true_prec, prior_mean, prior_prec,prior_shape,prior_rate)
output_mean[i] = tmp$mean
output_prec[i] = tmp$prec
}

INLA::inla.ks.plot(output_mean,punif)
INLA::inla.ks.plot(output_prec,punif)

1 Like

The issue isn’t just constraints—they found the same issue with just an unconstrained regression parameters in a linear regression with fixed noise scale.

The Jacobians for the multivariates are pretty hairy—see the manual.

Numerical problems typically turn into exceptions or divergences, but not always.

I’d still really like to see the distribution of histograms for, say, linear regression along with effective sample sizes. (I really don’t understand Sean’s code, so I can’t do it myself)

By distribution of histograms, are you just referring to basically 100 pdfs from different seeds from e.g. the linear regression model? I’m adding a check to throw an exception if there are less than num_iterations / 1000 n_effs, is that reasonable?

yep. that’s what I mean.

It would be better to record the actual n_eff, because then you can compare them with the exact distribution for a similar model with that many independent samples using my code from yesterday.

Basically, the aim is to try to separate out “things due to Monte Carlo error” from “everything else”

Ok, will record that. Is making 100 of these histograms better than making 1 histogram with 100x as many replications?

Yes. There’s always randomness in the histogram (it comes from the data, not from the MCMC).

Using more replications is probably not useful - eventually the MC error will make the K-S test reject the null (because it can’t be exactly true)