Hello everybody,
I’m currently studying Simulation Based Calibration and I have few questions that, as far as I’m concerned, I haven’t found clear answers.
I haven’t put my code as I’m just using known models (8 schools centered/non-centered) and I wanted to “refine” my post since it’s already quite heavy.
If after all my results you feel like it’d be relevant to have more details about my implementation, you can find the notebook I’m currently using here: https://github.com/vincentberaud/Tests/blob/main/SBC_8_schools.ipynb.
Context:
I’m comparing the results obtained through Simulation Based Calibration with the centered 8 schools and the non-centered 8 schools. I’m assessing them by checking the uniformity of the ranks using plots and also \chi^2 tests.
I compute the \chi^2 of each dimension per parameter separately as advised in https://doi.org/10.1198/106186006X136976 , and in the end I’m averaging the \chi^2 obtained per dimension to obtain a global \chi^2 of each parameter.
My long term objective is to compare other methods (like smc samplers for example) against NUTS in Simulation Based Calibration, but I want to realise a fair judgment by using the best possible setting of Stan’s NUTS with SBC. The idea is to generalise the evaluation using SBC with models that may involve high dimensions in the future.
Results:
\chi^2 Evaluation between simulated and infered parameters (1=perfect correlation)
- nl = number of simulated likelihood
- ns = number of samples (mcmc steps \times nb of chains) per simulated likelihood
CENTERED 8 SCHOOLS
Left | 500ns, 200nl | 5000ns, 200nl | 500ns, 1000nl | 5000ns, 1000nl |
---|---|---|---|---|
\mu | 0.53 | 0.35 | 0.012 | 0.31 |
\tau | 0.21 | 0.10 | 0.0007 | 0.012 |
\theta | 0.45 | 0.48 | 0.55 | 0.73 |
NON-CENTERED 8 SCHOOLS
Left | 500ns, 200nl | 5000ns, 200nl | 500ns, 1000nl | 5000ns, 1000nl |
---|---|---|---|---|
\mu | 0.97 | 0.18 | 0.70 | 0.02 |
\tau | 0.90 | 0.72 | 0.52 | 0.64 |
\theta | 0.39 | 0.49 | 0.56 | 0.58 |
Big concern:
In the paper aforementioned they show interpretations of histograms, like how to identify a “too narrow” variance of a certain prior for example. But I haven’t really found a rule of how many simulated likelihoods / samples should be used.
Also, I don’t even know if these results make sense because it feels odd to have detrimental results as I am increasing the number of samples.
Questions related to the parameters:
I don’t understand why, as I’m changing the values of the number of samples or simulated likelihood, the \chi^2 of the parameters are not always moving in the same direction. For example in the non-centered 8 schools, by increasing the number of samples I obtained a poorer \mu and \tau BUT a better \theta.
Questions related to Informative data :
I’ve read in Centered vs. non-centered parameterizations, that :
“centered actually works better when you have informative data (large N relative to σ ) for a particular group, while non centered is better for uninformative data (small N relative to σ ) for a particular group.“
Is it true, and if so, why am I not observing those results ?
Questions related to thinning:
All these results are realised using NUTS and without thinning, the reason is because whilst applying it, I obtained spikes at the boundaries (correlation between samples) ? Which doesn’t make sense to me, does it mean that I must have an issue inside my thinning process ?
And is it really relevant to apply thinning on NUTS since I assume there are less correlations than in a regular MCMC.?
Thank you very much for all the people that will spend time reading this.
I’m writing this post because after reading lots of documentation of Stan, I still have the feeling of swimming in an ocean of weirdness with these results.