Given the recent blog post about the bridgesampling R package I wanted to try it on some rstanarm stanreg objects. However, at least for a poisson glm fit, not all information that bridgesampling needs is retained, i.e.:

> bridge_sampler(mvr_stan_poisson1$stanfit)
Error in object@.MISC$stan_fit_instance$unconstrain_pars(pars) :
variable gamma missing

Is there some way to get around this or can I only use this with “manually” written Stan models?

The full dataset used for this is a bit large but the stan_glm call is this:

> mvr_stan_poisson1 <- stan_glm(Fix ~ Experience + Ability + treatment + System + Lab,
data = mvr, family = poisson,
prior = normal(0,20.0), prior_intercept = normal(0,20.0),
chains = CHAINS, cores = CORES, seed = SEED, iter = ITER)
> bridge_sampler(mvr_stan_poisson1$stanfit)
Error in object@.MISC$stan_fit_instance$unconstrain_pars(pars) :
variable gamma missing

I’ll see if I can cut down the dataset to something smaller/minimal that has the same behavior.

Should I understand you as bridgesampling on rstanarm models are generally supported and my failure here depends on some specifics about my model or dataset?

Note
To be able to use a stanreg object for samples, the user crucially needs to have specified the diagnostic_file when fitting the model in rstanarm.

See the diagnostic_file argument to rstan::sampling that is passed through the ... by model fitting functions in rstanarm.

When I run your code I get a Bayes factor of ~ 10. While I know that the labels used to describe Bayes factors are arbitrary, I often see values of 10 referred to as “strong evidence”.

By contrast, I added a couple lines of code (below) to compare these these models using leave one out cross validation (LOO). I don’t find much difference between models (i.e., less than a 1 SE difference between models). Is it odd that these two methods paint such a different picture?

The ‘art’ of marginal likelihood or Bayes factor based model selection is in choosing appropriate parameter priors. Stated more dramatically, this type of model selection extremely strongly hinges on the parameter priors (e.g., Lindley’s paradox). This is probably one of the reasons why many statisticians (among them at least some of the more vocal people behind the development of Stan, example here) do not advocate the use of Bayes factors. This may also be one of the reasons why they have not been adopted very widely.

So how does this relate to the example? In this example, the parameter priors are not chosen for Bayes factor based model selection so it is not clear if the results are anyhow sensible. They are probably not.

In general, priors that are appropriate for estimation are often not appropriate for Bayes factors. There is quite a bit of literature on how to choose such priors of which I list some below. However, I think it is fair to say that the priors implemented in rstanarm are generally not appropriate for this task.

Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default Bayes factors for ANOVA designs. Journal of Mathematical Psychology, 56(5), 356–374. https://doi.org/10.1016/j.jmp.2012.08.001

Bayarri, M. J., & García-Donato, G. (2007). Extending Conventional Priors for Testing General Hypotheses in Linear Models. Biometrika, 94(1), 135–152.

Ly, A., Verhagen, J., & Wagenmakers, E.-J. (2016). Harold Jeffreys’s default Bayes factor hypothesis tests: Explanation, extension, and application in psychology. Journal of Mathematical Psychology, 72(Supplement C), 19–32. https://doi.org/10.1016/j.jmp.2015.06.004

You say that priors appropriate for estimation are often not appropriate for Bayes factors. I was hoping that one of the papers you listed would provide a more full explanation of why this is the case (I didn’t see anything like this in my quick perusal of each paper). Do you happen to know of a paper that discusses this issue a bit more? While I’ve given thought to choosing priors for estimation, I’m a little unsure how or why these need to be adjusted for computing Bayes factors.

I mean, they all kind of talk about that. However, a paper that more specifically compares different priors and talks about estimation versus Bayes factors is:
Rouder, J. N., Haaf, J. M., & Vandekerckhove, J. (2018). Bayesian inference for psychology, part IV: parameter estimation and Bayes factors. Psychonomic Bulletin & Review, 25(1), 102–113. https://doi.org/10.3758/s13423-017-1420-7

I can confirm that we did not think at all about Bayes factors when choosing the default priors in rstanarm, but some form of hs() or product_normal() might work well.