SBC depends on the size of data? (not only MCMC sample size)

Until now, to me, SBC is a statistical test of the null hypothesis that MCMC is correct with respect to given prior. (If my interpretation is wrong, then please let me know.)

Now, I try to find appropriate priors in the sense of SBC.
When I run SBC with small data, then the rank statistic is not uniformly distributed.
On the other hand, with large data, the result of SBC is good.

Thus, in my view, the result of SBC depends on the size of data.
In frequentist paradigm, p-value monotonically decreases with respect to sample size. In SBC, dose such a phenomenon also occur?

In my case, roughly speaking, data consist a non-negative integer valued random vairable H. Its model with parameter \theta \in \Theta is defined by

H \sim \text{Binomial}(p(\theta),N),

where N is the number of Bernoulli trials, so N can be ragarded as a sample size of data and p:\Theta \to [0,1] denotes a differentiable function.
Prior are defined so that it generates a model paramter \theta in the pre-image p^{-1}(\epsilon, 1-\epsilon), i.e., \epsilon < p( \theta) < 1-\epsilon. Namely, \theta is distributed by the uniform distribution whose support is the pre-image p^{-1}(\epsilon, 1-\epsilon).

In this model, with some prior of \theta, if N is larger, then SBC is more flat.

Code is the following.

library(BayesianFROC)# I am the author
stanModel <- stan_model_of_sbc()

Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc(
NL = 11111,  #Sample size
NI = 11111,   #Sample size
stanModel = stanModel, #Stan model for SBC
ite     = 323,  # MCMC iterations
M       = 511, # sample size of Rank statistic

#The following codes are redundant
epsilon = 0.04,BBB = 1.1,AAA =0.0091,sbc_from_rstan = T)


Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc(
NL = 111,  #Sample size
NI = 111,   #Sample size
stanModel = stanModel, #Stan model for SBC
ite     = 323,  # MCMC iterations
M       = 511, # sample size of Rank statistic

#The following codes are redundant
epsilon = 0.04,BBB = 1.1,AAA =0.0091,sbc_from_rstan = T)

In the former code, SBC runs with small data, and in the later SBC runs with large data.
Because the later use larger data, the histogram is more flat.

The result of the former code
SBC runs with large data, i.e., in the above notation, large N
Rplot

The result of the later code
SBC runs with small data, i.e., in the above notation, small N
large

Sorry to post every day,
I would appreciate if you could point out any mistakes or opinions.

We purposefully avoided framing it as a null hypothesis significance test let alone computing an explicit summary statistic. As constructed in the paper it’s more of a visualization that highlights systematic inconsistencies due to computational problems. In other words SBC is more powerful not when looking for uniformity but instead looking for systematic deviations away from uniformity. If that makes any sense!

It does not. The variation in the SBC histogram should entirely be due to the number of simulated data sets fit, and hence number of ranks constructed.

That said the pathologies that can frustrate computational algorithms and manifest in SBC problems can depend on the data. When you have lots of data and the likelihood concentrates it can cut off pathologies in the prior model that would manifest in skewed SBC ranks.

Unfortunately I don’t have time to look at your code in any depth, but my suspicious would be that the constraining of the probability into [epsilon, 1 - epsilon] might not be consistent in the simulation and the fit. Without enough data the likelihood, and posterior density, would concentrate away from the boundaries and any issues there wouldn’t been seen in SBC, but for smaller data sets the likelihood and posterior density would diffuse and any problems near the boundary would be more problematic.

1 Like