I use LOO-CV as implemented in loo (while doing exact LOO-CV for data points with pareto-k estimate above 0.5) for a statistical analysis of experimental data. Recently, I noticed that the pareto-k estimates were quite stochastic over different fitting runs, both in terms of the number of data points with pareto-k > 0.5 (in my case ~4-10 out of ~600 data points) and in terms of the values. It is clear to me that this is stochastic, because it depends on the posterior and probably also because of the fit of the generalised Pareto distribution. But I am curious how much variability over different fitting runs I should expect, especially, since the pareto-k estimates itself may be used as a way to assess the model. There is most probably no answer for all possible kinds of models and data, but maybe some general recommendations how to address this question.

To determine the needed size of the posterior sample, I mainly rely on no warning popping up when running the stan sampler, and on whether both the posterior and the distribution of simulated data in the posterior predictive check look ‘complete’ (smooth if appropriate, not too noisy, no gaps). Will such a sample size be sufficient to rely on the pareto-k estimates?