We (Daniel Schad, Bruno Nicenboim & Shravan Vasishth) have a manuscript on arXiv ([2203.02361] Data aggregation can lead to biased inferences in Bayesian linear mixed models and Bayesian ANOVA: A simulation study) using SBC for Bayes factors (Schad et al., 2022, PsychMethods) to show that data aggregation can lead to biased inferences in Bayesian LMMs/ANOVA. In the manuscript, we provide a brief intro to SBC (see below).
A reviewer commented on this: "I think that the statement on L248 is unclear (on why Equation 6 shows that the distributions of M and M’ are identical). In the first paper version, the explanation was similar to that in Talts, Betancourt, Simpson, Vehtari, and Gelman (2018). But also there it is not said why this works, only that it does. Actually, the explanation provided in Cook, Gelman, and Rubin (2006) is the clearest I could find. Could you just add a few more words that completely explain this (to people for whom the SBC is new, like me)?“
One issue here is that a recent manuscript (Modrak et al., 2022) suggested that the “justification of SBC by Talts et al. (2018)” was not correct, and a different basis for SBC was suggested (Modrak et al., 2022; our adaptation of this to SBC for BF see below). However, on the other hand, we understood Michael Betancourt (personal communication) as suggesting that the original formulation is not incorrect.
Moreover, we don’t fully understand why the math provides a foundation for SBC (i.e., why equation 6 below shows that the distributions of M and M’ are identical).
Thus, overall, we are unsure about how to respond to this issue; i.e., how to deal with the different suggestions for foundations of SBC (Talts et al., 2018 vs. Modrak et al., 2022), and, importantly, how to explain why this works?
Any help with this would be highly welcome!
Thanks much
First, we write down the joint distribution of prior and posterior model probabilities as well as the data [see @modrak2022simulation for posterior parameter inference]:
p(y, \mathcal{M}', \mathcal{M}) = p(\mathcal{M}' \mid y) p(y \mid \mathcal{M}) p(\mathcal{M}) Equation (5)
Moreover, we have stated above that p(\mathcal{M} \mid y) = \frac{p(y \mid \mathcal{M}) \times p(\mathcal{M})}{p(y)}, which can be reformulated as p(\mathcal{M} \mid y) p(y) = p(y \mid \mathcal{M}) \times p(\mathcal{M}). This implies that
p(y, \mathcal{M}', \mathcal{M}) = p(\mathcal{M}' \mid y) p(\mathcal{M} \mid y) p(y) Equation (6)
Equation 6 shows that given a specific data set y, the distributions of \mathcal{M} and \mathcal{M}' are identical. That is, in SBC, if the simulation of the data and the estimation of posterior model probabilites (i.e., of Bayes factors) is accurate, i.e., without bias, then the prior probabilities of the models should be the same as the posterior probabilities of the models. If the average posterior model probabilities are too large/small, then this indicates that Bayes factors are estimated as being too large/small, exhibiting liberal/conservative bias.
@bnicenboim, @vasishth , @martinmodrak, @paul.buerkner, @betanalpha, @seantalts