Ok, this is not really about Stan, but I guess that there are enough SBC experts around to answer me :)
(Take into account that maybe my question is super naive.)
Cook, Gelman, and Rubin (2006) write the following
The basic idea is that if we draw parameters from their prior distribution and draw data from their sampling distribution given the drawn parameters, and then perform Bayesian inference correctly,
the resulting posterior inferences will be, on average, correct. For example, 50% and 95% posterior intervals will contain the true parameter values with probability 0.5 and 0.95,
respectively.
This is to verify that the credible intervals are actually confidence intervals as well, isn’t it?
Regardless of that, I was wondering the following.
Say that I run 100 simulations a la SBC. I sample 100 times:
\tilde{\mu} \sim p(\mu)
\tilde{\sigma} \sim p(\sigma)
\tilde{D} \sim p(D| \tilde{\mu},\tilde{\sigma})
I calculate the rank statistic for each simulation, and also the 50% credible interval as below, where value
stores the ground truth of mu and sigma for each simulation.
> df_sbc %>% print(n=10)
# A tibble: 200 Ă— 7
sim var rank X25. X50. X75. value
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 mu 435 -0.167 0.520 1.26 0.975
2 1 sigma 403 7.55 8.02 8.56 8.20
3 2 mu 156 -0.309 0.0260 0.368 -0.322
4 2 sigma 215 3.10 3.34 3.61 3.17
5 3 mu 652 4.11 4.59 5.03 5.90
6 3 sigma 429 4.12 4.44 4.79 4.62
7 4 mu 36 2.42 2.61 2.81 2.15
8 4 sigma 550 1.66 1.80 1.96 2.04
9 5 mu 80 -0.272 0.0115 0.293 -0.456
10 5 sigma 281 2.39 2.59 2.80 2.53
# … with 190 more rows
>
Why is it an improvement to look at the histogram and ecdf based on the rank, rather than to use the 50% intervals and just check whether value
is 50% of time inside the 50 interval, 25% time below it and 25% above it?
I can even use to get a 95% interval for each number:
> qbinom(c(0.025,0.975), size = 100, prob = .5)
[1] 40 60
> qbinom(c(0.025,0.975), size = 100 , prob = .25)
[1] 17 34
Is it because this less sensitive to bad calibration than the histogram/ecdf plot? Do I need more simulations here to conclude something? Or am I missing something?