Various approaches to model comparisons

torkar · October 25, 2019, 6:49pm

I didn’t know where to post this but I guess here is probably best. In @paul.buerkner’s paper he writes a footnote saying:

In a Bayesian framework, models may be compared by various means for instance Bayes factors (Kass & Raftery, 1995), (approximate) cross-validation methods (Vehtari et al., 2017), information criteria (Vehtari et al., 2017; Watanabe, 2010) or stacking of posterior-predictive distributions (Yao, Vehtari, Simpson, & Gelman, 2017). A discussion of the pros and cons of these various approaches is outside the scope of the present paper.

So, my question is can you point me to a paper that discusses the pros and cons of these approaches, preferably with some accompanying empirical evidence and not just thoughts?

maxbiostat · October 25, 2019, 8:24pm

I’ll offer my (maybe not as humble as it should be) opinion: the idea of comparing Bayes factors and cross-validation, say, seems extremely weird to me. These are approaches that try to capture very different aspects of model fit. @avehtari and others have been quite vocal about the inadequacy of Bayes factors in the so-called M-open setting (see Section 2 here) , where none of the models under consideration is taken as the “true” model. In such a setting, it seems to me they argue cross-validation/stacking is the way to go. I find the critique by Gronau & Wagenmakers particularly compelling against this view, but Vehtari et al. were not amused.

I’m linking to papers by @avehtari, @yuling and @anon75146577 not because I want to explain their work better than them, but because I tend to side with a view somewhat opposite to theirs and closer to Gronau and Wagenmakers. I wanted to state this point of view without leaving out important references.

anon75146577 · October 25, 2019, 8:30pm

Read Dani Navarro’s work for an “on the ground” view. Blog: https://djnavarro.net/post/a-personal-essay-on-bayes-factors/ but there’s also a paper.

My view on the G&W critique was that if you’re going to criticize LOO, there are a lot stronger arguments to make (many of which we outlined in our comment).

Generally BF vs LOO is kinda a silly question. It depends on what you want to do. The answer is probably neither. Work out what questions these tools can and cannot answer and use them
appropriately.

avehtari · October 26, 2019, 10:28am

Yao, Vehtari, Simpson, and Gelman (2017) has empirical evidence comparing BF, cross-validation and stacking for model averaging both in M-open and M-closed cases.

Piironen and Vehtari (2017) has empirical evidence for model selection comparing BF, cross validation, information criteria and projection predictive (projpred) approach.

torkar · October 26, 2019, 10:45am

The Piironen and Vehtari (2017) paper seems to be what I’m after - thanks :)

andrewgelman · October 27, 2019, 1:33am

I think the right way to think of these things is literally.

Bayes factor is literally the ratio of marginal likelihoods, p(y|M1)/p(y|M2). As such, it has serious problems when p(y|M1) or p(y|M2) are not well defined, as when M1 or M2 has noninformative or weak priors that are assigned arbitrarily.

LOO is literally an estimate of out-of-sample prediction error. There is no literal reason to use it to choose M1 or M2 unless your goal is to have lower out-of-sample prediction error, which you might want in some settings and not others.

Stacking is literally an estimate of a weighted-average model that minimizes out-of-sample prediction error. Again, if that’s your goal, fine.

This idea of treating statistical methods literally can be very helpful. The p-value is literally the probability bla bla bla . . . not a statement about whether a hypothesis is true. The confidence interval is literally a procedure which, at least 95% of the time bla bla bla . . . not a measure of uncertainty. Etc.

Topic		Replies	Views
IS-LOO / K-fold CV vs Bayes factors General loo , model-comparison	2	1454	October 28, 2020
Hypothesis testing, model selection, model comparison - some thoughts General bayes-factor , model-comparison , model-selection	6	3788	November 10, 2020
Loo_compare vs averaging/weighting via stacking or pseudo-BMA weighting rstanarm fitting-issues , loo	2	895	February 5, 2021
Loo with k_threshold parameter vs. kfold for comparing rstanarm models rstanarm loo	5	1117	December 21, 2018
LOO-CV for non-bayesian models (too stupid idea?) General loo	2	518	March 1, 2019

Various approaches to model comparisons

Related topics