Model Comparison when high Pareto k?

Sheeta27 · May 8, 2025, 3:02pm

Hi everyone!

I’m modeling some data thanks to a joint model between survival and longitudinal data. Those longitudinal data have a hierarchical structure, with two random effects levels : each patients have a least one metastasis that is at least observed once.

I have several model that I would like to compare.

I’ve used the loo package with a leave-one-patient-out or leave-one-measurement out, but it leads in both scenario to high pareto’s k.
Of my understanding, this doesn’t not necessarly means that the model is wrong, but that the importance sampling part can’t be trust.
The WAIC is also failing.

So I wanted to use the kfold function, but I feel like we can only use this fonction with rstanarm model and not with rstan model. Am I wrong ? Or could we use the kfold function with rstan (is there a work around) ?

Otherwise, is there another way to compare model, please ?

Any help would be appreciate!

Thank you

-S

avehtari · May 9, 2025, 7:24am

loo package has a vignette Holdout validation and K-fold cross-validation of Stan programs with the loo package • loo, which shows how to do K-fold-CV with RStan.

Based on this, it is likely that your model is flexible and the posterior is changing a lot when removing one observation or patient, which can explain high Pareto-k’s. If you tell more about your model and post the model code and the loo output, I may comment more.

WAIC always fails before PSIS-LOO fails.

Sheeta27 · May 10, 2025, 8:58am

Thank you !
I was using
fit <- stan(model_code = stan_program, data = dat, ...)
instead of
fit <- sampling(stanmodel, data = dat, ...)

Sheeta27 · May 20, 2025, 9:28am

Hi!

I’ve seen in the paper: “Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC”
s11222-016-9696-4.pdf (1.4 MB) that it’s recommended to performed cross-validation with 10 fold.
However, my model are quite computationnaly expansive on their own (up to 2 days using the within-chain parallelization), I was considering performing a 5 fold cross validation.

Do you think that it’s still good enough? Or you think that I should stick with 10 fold even if it’s a heavy methodology ?

Thank you in advance,
-S

avehtari · May 20, 2025, 12:58pm

5-fold is better than not doing it all. It depends on the data and model, how much difference there is between 5-fold and 10-fold. If there is a lot of data compared to the number of parameters, then the results are less sensitive to the number of folds. You can also get some speedup by initializing the cross-validation fits with the full data posterior draws and use shorter warmup. It is also possible that you don’t need as many iterations for getting a good estimate of cross-validation elpd as you need for posterior inference otherwise. In our paper Bayesian cross-validation by parallel Markov chain Monte Carlo | Statistics and Computing, we demonstrate the warmup and fewer iterations in case of GPU parallel computation, but the same idea can be used even without GPUs. This way you might be able to at least halve the computation time for each fold.

Sheeta27 · June 29, 2025, 9:22pm

Hi
Thank you for the feedback!
For my richest dataset I have access to 608 survival data associated to 6919 observations that are nested in 1401 lesions that are nested in 1026 organs that are themselves nested in 608 patients.

And the model has 40 parameters if I’m only accounting for the population parameters. If I count each individual and lesion random effects, then I have 6065 parameters.

Grouping the data by patient, I chose to stick with the 5-fold cross validation (and not the 10 fold) due to computationnal cost. However, when I look at the elpd and se_elpd value of the 6 models that I compared, I feel like I’m saying that they are kinda equivalent.
(I also use the “p” added with Revised Uncertainty in Bayesian leave-one-out cross-validation based model comparison - #2 by avehtari)

Am I right? Is there something else that I could tried to see which model I should choose? I feel like the se_elpd are quite high? But I’m not sure if I can do something about it?

(I’ve tried using PSIS-LOO as leave-one-patient out and leave-one-measurement out, but my pareto k are too high (about 40% and 20% respectively), and using matching did not resolve the issue)

Thank you

-S

avehtari · July 21, 2025, 1:50pm

Hi, I’ve been on vacation and now going through the messages

Your model is flexible, and it is likely you will get high Pareto-k’s even if the model would be good.

I agree. Sometimes the differencve in predictive performance can be small even if the data have enough information to inform the posterior. See, e.g., Section 13 and 15 in Nabiximols treatment efficiency case study.

Topic		Replies	Views
LOO Model Comparison Alternative Modeling rstan , techniques , loo , cmdstanr	3	94	March 27, 2025
Loo with k_threshold parameter vs. kfold for comparing rstanarm models rstanarm loo	5	1117	December 21, 2018
Interpret pareto k diagnostic Modeling rstan , fitting-issues , loo	3	1677	August 3, 2023
K-fold cv unreliable results General loo	11	1391	August 25, 2017
Problems in model comparsion with loo-package for a self-written stan model with explanatory variables and hierarchy Modeling specification , loo	6	477	September 2, 2020

Model Comparison when high Pareto k?

Related topics