Trying to get a Bayes Factor from an rstanarm glm

Please share your Stan program and accompanying data if possible.


Evening all,

I’m trying to obtain a Bayes Factor to describe two models of my data from my thesis, one with one predictor and a second with two predictors. The “Discount” predictor has 3 levels, and the “TP” predictor has two. When I try to run the code however, I get the following error:

Error in bridge_sampler.stanreg(x, silent = TRUE) :
the ‘diagnostic_file’ option must be specified in the call to stan_glm to use the ‘bridge_sampler’

Anyone know how to specify “diagnostic_file” in the stan_glm call? I can’t find instructions anywhere, and including = NULL or =TRUE did not work. Also, I tried writing in =“testfile.csv” and =“testfile2.csv” and that didn’t work either; I get the same error.

Or, if there is any other way to get a Bayes Factor from my models using a different method, that would work just as well.

Discountmodel <- stan_glm(
  Accept_Reject ~ Discount, 
  data = pubdata, 
  family = binomial(link = "logit"), 
  prior_intercept = NULL,
  cores=4,
  QR = FALSE,
  chains = 3, iter = 10000)

Discount_TP_model<- update(Discountmodel, formula = Accept_Reject ~ Discount + TP)

ModelsBF=bayesfactor_models(Discountmodel,Discount_TP_model,denominator = 1,verbose = TRUE)
****

It is documented under ?bridgesampling::bridge_sampler.stanreg, which includes the following example

library(rstanarm)

# N.B.: remember to specify the diagnostic_file

fit_1 <- stan_glm(mpg ~ wt + qsec + am, data = mtcars,
                  chains = 2, cores = 2, iter = 5000,
                  diagnostic_file = file.path(tempdir(), "df.csv"))
bridge_1 <- bridge_sampler(fit_1)
fit_2 <- update(fit_1, formula = . ~ . + cyl)
bridge_2 <- bridge_sampler(fit_2, method = "warp3")
bf(bridge_1, bridge_2)

So, you could do something like diagnostic_file = file.path(tempdir(), "df.csv") as well when you call stan_glm.

1 Like

Thanks! The first model seemed to work fine; I copied the exact output you had for diagnostic_file and used the example’s bridge1<-bridge_sampler(fit_1) line, and that totally worked.

When I tried to duplicate this for the second model however I’m getting another error: ‘weights’ must be a numeric vector.

Try without using the update function; just write the second call to stan_glm out.

I did actually, because the update function was giving me the same error as before. The code below is what I just used…the first model works perfectly, but the second model gives me the ‘weights’ error. Oddly enough it works when I delete the second predictor. But when I put TP back in the model it breaks the code

Discountmodel <- stan_glm(
  Accept_Reject ~ Discount, 
  data = pubdata, 
  family = binomial(link = "logit"), 
  prior_intercept = NULL,
  cores=4,
  QR = FALSE,
  chains = 3, iter = 10000,
  diagnostic_file=file.path(tempdir(), "df.csv"))

bridge1<-bridge_sampler(Discountmodel)

Discount_TP_model <- stan_glm(
  Accept_Reject ~ Discount, TP,
  data = pubdata, 
  family = binomial(link = "logit"), 
  prior_intercept = NULL,
  cores=4,
  QR = FALSE,
  chains = 3, iter = 10000,
  diagnostic_file=file.path(tempdir(), "df.csv"))

bridge2<-bridge_sampler(Discount_TP_model)

That should be

Discount_TP_model <- stan_glm(
  Accept_Reject ~ Discount + TP, ...

However, Bayes factors are not well-defined with improper priors (on the intercept).

3 Likes

THAT WORKED!!! You sir, deserve some kind of medal. Never thought I would be able to figure this out. Thanks a million!

Will the default priors be a serious problem for interpretation as they are right now? I will meet with my research mentor this week and I can have her help specifying a prior, but I don’t have the knowledge myself to do that. Regardless…I’m just ecstatic I have a fully working script finally

They don’t give medals for fixing typos. Basically, the easiest way I know of to think about model probabilities, Bayes factors, etc. is that they are a contest to see which prior predictive distribution turned out to be most consistent with the observed data. For better or for worse, the prior predictive distribution is not well-defined when some of the priors are improper.

1 Like