Until now, to me, SBC is a statistical test of the null hypothesis that MCMC is correct with respect to given prior. (If my interpretation is wrong, then please let me know.)
Now, I try to find appropriate priors in the sense of SBC.
When I run SBC with small data, then the rank statistic is not uniformly distributed.
On the other hand, with large data, the result of SBC is good.
Thus, in my view, the result of SBC depends on the size of data.
In frequentist paradigm, p-value monotonically decreases with respect to sample size. In SBC, dose such a phenomenon also occur?
In my case, roughly speaking, data consist a non-negative integer valued random vairable H. Its model with parameter \theta \in \Theta is defined by
H \sim \text{Binomial}(p(\theta),N),
where N is the number of Bernoulli trials, so N can be ragarded as a sample size of data and p:\Theta \to [0,1] denotes a differentiable function.
Prior are defined so that it generates a model paramter \theta in the pre-image p^{-1}(\epsilon, 1-\epsilon), i.e., \epsilon < p( \theta) < 1-\epsilon. Namely, \theta is distributed by the uniform distribution whose support is the pre-image p^{-1}(\epsilon, 1-\epsilon).
In this model, with some prior of \theta, if N is larger, then SBC is more flat.
Code is the following.
library(BayesianFROC)# I am the author
stanModel <- stan_model_of_sbc()
Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc(
NL = 11111, #Sample size
NI = 11111, #Sample size
stanModel = stanModel, #Stan model for SBC
ite = 323, # MCMC iterations
M = 511, # sample size of Rank statistic
#The following codes are redundant
epsilon = 0.04,BBB = 1.1,AAA =0.0091,sbc_from_rstan = T)
Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc(
NL = 111, #Sample size
NI = 111, #Sample size
stanModel = stanModel, #Stan model for SBC
ite = 323, # MCMC iterations
M = 511, # sample size of Rank statistic
#The following codes are redundant
epsilon = 0.04,BBB = 1.1,AAA =0.0091,sbc_from_rstan = T)
In the former code, SBC runs with small data, and in the later SBC runs with large data.
Because the later use larger data, the histogram is more flat.
The result of the former code
SBC runs with large data, i.e., in the above notation, large N
The result of the later code
SBC runs with small data, i.e., in the above notation, small N
Sorry to post every day,
I would appreciate if you could point out any mistakes or opinions.