The default method for obtaining the marginal likelihood using the bridgesampling
package already provides an error measure. It uses the formula for the approximate relative mean-squared error developed by Frühwirth–Schnatter (2004). So one could use this to investigate whether or not the marginal likelihood on which the Bayes factor is based is precise enough. However, we do not propagate this uncertainty to the calculation of the Bayes factor. I am not sure if there is a straight-forward way to do so, but this is also not super important.
The main issue is that bridge sampling, like other sampling approaches, is a numerical method for which diagnosing convergence problems is generally not trivial. The added difficulty in our case is that the data that is used for sampling is itself a sample generated by an MCMC chain. Thus, there are two levels of numerical uncertainty; uncertainty from the posterior and from the bridge sampler.
So the only real way to ensure stability of the sampler is to do the equivalent of running independent MCMC chains, as done in the initial code here. Run Stan
multiple time to receive multiple independent samples from the posterior (of course each of those already consists of multiple independent chains). For each of those sets of posterior samples obtain at least one estimate of the marginal likelihood. If the estimates of the marginal likelihood are all near enough to each other (e.g., considering the magnitude of differences in marginal likelihoods between the different models), the Bayes factor will be kosher. If not, usually more samples from the posterior distribution are necessary.
So, to us it is not immediately apparent what kind of check to add to our package. The bridge_sampler
function already contains the argument repetitions
which allows to obtain more than one marginal likelihood estimate from one fixed set of posterior samples. This allows an estimation of the uncertainty on the second level. However, to get a full overview of the uncertainty a new set of posterior samples is necessary.
Perhaps most importantly, estimating the marginal likelihood usually requires at least one order of magnitude more samples than estimation. We warn about this both in our paper and the help page. For example (from ?bridgesampling::bridge_sampler
):
Also note that for testing, the number of posterior samples usually needs to be substantially larger than for estimation.
Maybe the easiest solution would be to add a similar warning to the help page of brms::bayes_factor
. And also encourage users to get at least two independent sets of posterior samples and estimates of the Bayes factor.