I read on this page from bayestestR that when you’re working with a factor that has 3 or more levels, you can get biased Bayes Factors when comparing models. In order to correct this bias, they suggest using the code contr.bayes, which I tried. However, this completely screwed up all of my parameter estimates.
I’ve attached a picture of a table comparing the results of a standard frequentist glm() to a stan_glm *before * calling contr.bayes, and then one after. Is this code working properly? Or is there something I’m missing here? Not sure why its skewing things…
I am not a huge fan of Bayes factors and so I don’t know a lot about them, but I skimmed through the references and this seems to be expected - the contr.bayes function changes the design matrix (the way factors are coded) from standard zero-or-one dummy coding to something else, so the coefficients should change (and their interpretation as well - actually not sure how to intepret them in this case). I also understand why you might need that correction to get something useful from Bayes factors with fixed effects. I would add that up to a host of other things where Bayes factors are unintuitive :-). I think the problem would not apply for varying effects (e.g. 1|Discount) and so those should be easier to use in this way. If you are unsure on how to work with Bayes factors, I would suggest you just interpret the 95% and 50% posterior intervals for your parameters (unless your reviewer/boss/… makes you to compute Bayes factors). Posterior intervals require little special care and are relatively intuitive, and I think it is vital to work with a tool you understand.
I’ve also recently answered a similar inquiry where I go a bit more into possible approaches when you need something similar to hypothesis testing in Bayesian contest:
Thanks, this helps a ton! I actually agree with you 100%, I think using Bayes Factors is not a good idea for NHST and I much prefer the ROPE method you outlined. Krushcke & Liddell (2018) made a pretty strong case against Bayes Factors IMHO for this purpose (specifically, pages 164-167). In this final term project though, which is my first foray into Bayes, I used ROPE’s to test the model parameters and then Bayes Factors to test the models to see which model was the most likely to have generated my data. Then I used LOO and R2 to see how well the model with the best BF performed overall. Not sure how good an approach that is, but I had 3 weeks to learn had to Bayes from scratch so I went with it :)
The reason I didn’t use LOO as the primary method of model evaluation is that I’m not entirely sure how relevant it is to what I’m doing. Specifically, I’m building explanatory models to test theory (I’m a psychology PhD student); I’m not necessarily trying to maximize out-of-sample prediction or build a predictive model that is highly accurate in terms of classification.
Then I would however also advise against Bayes Factors (as far as I remember, might be wrong in details) - using BF choses among the candidate models one, that minimizes KL-divergence between the model (including priors) and the hypothetical true model. And from my experience KL-divergence is way weirder than out-of-sample prediction (i.e., I think I understand out-of-sample prediction, tried to really grasp KL-divergence a few times and mostly failed). I personally think Danielle’s approach presented in the “Between the devil and the deep blue sea” paper is the most sensible, but it IMHO mostly applies to more complex models than linear regression. When you are working with regression you basically 100% know that your model is completely false, so focusing on prediction does not seem to me such a bad idea…
Well, this is just my rambling - I actually have very limited real experience with model selection and so I am mostly repeating things I’ve read and that make sense to me, so don’t put too much weight on that… :-)
Asymptotically, that is, you need to know also whether you are in the asymptotic regime.
If your theory can’t predict future, there is not much you can do to get more evidence to support that theory.
There was not enough information about your data and model to say whether loo or cross-validation would be helpful. If you are interested, it might be better to start a new thread and describe bit more your data and model.