Thanks so much for the detailed response!
Can you tell more about your model and modeling task, so I may have recommendations for easier to interpret utility or cost functions?
Surely. I am running a cognitive psychology experiment with a 3x2 between-subjects design. The purpose is to find out what is more important to decisions to accept or reject a deal: The initial time period spent waiting for it (“Floor”) or the percent-reduction in time from an initially longer period (“Discount”).
Since I’ve got a theory I want to test (it’s a combination of the two, not just the Floor value), I want to find the model that provides the best theoretical account of my data. I’m not as concerned with its overall utility to predict as I am its ability to explain my data. But at the same time, I recognize that I need to have some way to critically evaluate the model’s performance to make sure it’s not garbage, which is why I’m now trying to learn about LOOIC.
My two models (so far) are below. Still trying to fix the code for a different one.
Main_EffectsModel=stan_glm(Accept_Reject~Discount+Floor,
family = binomial(link = "logit"),
data=sonadata_clean,
prior = student_t(df=5,location = NULL,scale = NULL,autoscale = TRUE),
#prior_intercept = normal(),
#prior_PD = TRUE,
algorithm = c("sampling"),
mean_PPD = TRUE,
adapt_delta = 0.95,
#QR = FALSE,
#sparse = FALSE,
chains=3,iter=50000,cores=3,
diagnostic_file=file.path(tempdir(), "df.csv"))
Interaction_Model=stan_glm(Accept_Reject~Discount+Floor+Discount*Floor,
family = binomial(link = "logit"),
data=sonadata_clean,
prior = student_t(df=5,location = NULL,scale = NULL,autoscale = TRUE),
#prior_intercept = normal(),
#prior_PD = TRUE,
algorithm = c("sampling"),
mean_PPD = TRUE,
adapt_delta = 0.95,
#QR = FALSE,
#sparse = FALSE,
chains=3,iter=50000,cores=3,
diagnostic_file=file.path(tempdir(), "df.csv"))
form the loo output I can see that you are using 75000 posterior draws, which is probably about 71000 more than you would need taking into account that you seem to have quite simple model (p_loo around 4-6) and plenty of observations so that the posterior is likely to be very easy
The reason I am using so many steps is because of bayestestR::bayesfactor_models()
. I was initially using 5,000 draws, but then when I ran this command to compare my models I got this message in the console…
Bayes factors might not be precise.
For precise Bayes factors, it is recommended sampling at least 40,000 posterior samples.Computation of Bayes factors: estimating marginal likelihood, please wait..
…so I added another 0 and made it 50,000 draws to be safe.