Model Comparison using LPML and WAIC

Mayormore · May 2, 2022, 3:37am

Hi, dear all, I am writing to ask a general question about Bayesian model comparisons.

In general, we may use log pseudo marginal likelihood (LPML) and WAIC as criteria to evaluate and compare the goodness of model fitting. However, here comes a problem I met in both simulation studies and real data analysis.

In simulation studies, when I check the L_2 distance between posterior predicted distributions and real distributions, I found Model A has a lower deviation while neither LPML nor WAIC supports model A but the competitor Model B. So how to say which model is “better” on this simulated data set?

Again in a real data example for classification, we find that results of Model C provide higher AUC value while both LPML and WAIC support the competitor Model D instead.
So how to judge which model is more suitable for this kind of data set?

Hint: all LPML and WAIC are computed by loo package.

Many thanks!

torkar · May 2, 2022, 9:01am

@avehtari should be able to provide a qualitative assessment of the pros and cons in this case.

avehtari · May 3, 2022, 12:05pm

They are the same criterion. PSIS-LOO computation is in general more accurate and has better diagnostics than WAIC. Computing both of them doesn’t provide any more information than computing just the elpd_loo with PSIS-LOO.

In general, there is no theoretical reason that two different utilities/lossess would provide the same ranking. You don’t provide enough information to say much more, but 1) L2 distance is less sensitive to differences in tails (which is not good if you care about the tails), 2) the differences can be that small that the difference in ranking doesn’t matter.

AUC considers only the ranking of the predicted probabilities, but doesn’t care at all whether those probabilities are bad, while log score penalizes a lot if the predicted probabilities are overconfident.

If you care about the whole distribution then log-score is the right choice (see, e.g. Bernardo & Smith, 1994). If in addition of modeling (conditional) distribution of the data, you also have some decision task or need to act based on the model, then you can consider other utilities / losses. This can be useful when your model is misspecified (so that log score tells it’s bad), but that misspecification doesn’t influence your task specific utility/loss. AUC is a strange one, as it corresponds to average over different ratio of losses for right and wrong decisions, but that doesn’t correspond to any real life binary prediction/action task. If you tell more about your application and what kind of decisions will be made using the models, I can comment more on useful utilities / losses.

Mayormore · May 25, 2022, 3:19pm

Many thanks for your reply! I attempt to modify my method, and then I would give more clear description to that.

Topic		Replies	Views
Four questions about information criteria/cross-validation and HMC in relation to a manuscript review General	12	1241	March 31, 2020
Comparing different likelihoods with loo-cv Modeling rstan , fitting-issues , loo , model-comparison	4	128	October 5, 2024
Interpreting elpd_diff - loo package Modeling loo , interpret-results	47	14769	November 9, 2020
Model comparison, Log-Likelihood and WAIC Modeling	1	1137	December 1, 2019
Can WAIC/LOOIC be used to compare models with different likelihoods? Modeling	1	2975	January 22, 2019

Model Comparison using LPML and WAIC

Related topics