I use LOO-CV as implemented in loo (while doing exact LOO-CV for data points with pareto-k estimate above 0.5) for a statistical analysis of experimental data. Recently, I noticed that the pareto-k estimates were quite stochastic over different fitting runs, both in terms of the number of data points with pareto-k > 0.5 (in my case ~4-10 out of ~600 data points) and in terms of the values. It is clear to me that this is stochastic, because it depends on the posterior and probably also because of the fit of the generalised Pareto distribution. But I am curious how much variability over different fitting runs I should expect, especially, since the pareto-k estimates itself may be used as a way to assess the model. There is most probably no answer for all possible kinds of models and data, but maybe some general recommendations how to address this question.

To determine the needed size of the posterior sample, I mainly rely on no warning popping up when running the stan sampler, and on whether both the posterior and the distribution of simulated data in the posterior predictive check look ‘complete’ (smooth if appropriate, not too noisy, no gaps). Will such a sample size be sufficient to rely on the pareto-k estimates?

Hi, sorry, your question seems to have been left behind. You are right that (as with all the other diagnostics we have), there is stochasticity, but I can’t comment much beyond that.

The main authority on LOO is @avehtari, so hope he’s not too busy to clarify.

This is a quite good result. You can run longer chains to reduce the variability. You could also look at the marginal posterior for each k given the posterior draws you have. That marginal posterior is computed as part of the estimation of k, but it’s not currently returned by gpdfit function.

If you have time and interest to experiment, I would be interested in your experience whether those marginal posteriors would be useful to understand the variability. I don’t have time right now to add a new function that would return the marginal posterior for k, but if you have skills then look at the code https://github.com/stan-dev/loo/blob/master/R/gpdfit.R and lines 46 and 47. theta is the parameter of the profile distribution, w_theta are the posterior weight for each theta value, line 46 computes the theta_hat which is the posterior mean of theta, and line 47 computes the corresponding khat. You could instead computed a vector khats so that each element is computed using the equation on line 47, but with different values of vector theta. Then khats vector and w_theta give you a weighted quadrature points presenting the posterior of k that you can plot. If this explanation is confusing, I may have time this week to create a new function that would provide this marginal and an example how to use it.

Thank you very much for your response and the suggestion. I am actually a bit under time pressure as I am in the last 6 months of my PhD… and I have to focus on my more applied questions. So, I am not sure if I can work on this in the near future.
Still, if I understand correctly, you suggest to compute a vector \texttt{khat_vector} with \texttt{khat_vector[i] <- mean.default(log1p(-theta[i] * x))} for i = 1,\dots,\texttt{length(theta)} and then plot a weighted histogram with the weights \texttt{w_theta} (I would use either weights::wtd.hist() or plotrix::weighted.hist() for the weighted histogram, or is there a better plotting option?). And then I would, for a number of stan runs, plot these weighted histograms/distributions for each data point and see how much and how they vary across stan runs. Maybe these distributions would be more stable than the khat values, or maybe one finds something interesting. Did I understand correctly?