When I run your code I get a Bayes factor of ~ 10. While I know that the labels used to describe Bayes factors are arbitrary, I often see values of 10 referred to as “strong evidence”.
By contrast, I added a couple lines of code (below) to compare these these models using leave one out cross validation (LOO). I don’t find much difference between models (i.e., less than a 1 SE difference between models). Is it odd that these two methods paint such a different picture?
# N.B.: remember to specify the diagnostic_file
fit_1 <- stan_glm(mpg ~ wt + qsec + am, data = mtcars,
chains = 2, cores = 2, iter = 5000,
diagnostic_file = file.path(tempdir(), "df.csv"))
bridge_1 <- bridge_sampler(fit_1)
fit_2 <- update(fit_1, formula = . ~ . + cyl)
bridge_2 <- bridge_sampler(fit_2, method = "warp3")
loo_1 <- loo(fit_1)
loo_2 <- loo(fit_2)