Pointwise loo likelihood for binary classification

Hi all,
Currently I am using stan for a binary classification problem. To check the discriminatory power of my model I want to use measures like the recall, precision and the receiver operator characteristics curve. Due to the size of the data set I do not want to split up my data set into a training and a validation set. Instead of this I thought about using a similar approach as used in Vehtari, Gelman, Gabry’s paper , in which pareto smoothed importance sampling is used the calculate the cross validation likelihood by:


Where w_i^s is the weight determined by the pareto smoothing of raw importance sampling ratios.
As input for the recall etc., I want to use the cross validation likelihoods, however I am not sure if this is a good approach due to variance and bias in the approximation. Does anyone know if it is okay to use these approximations for these types of measures?
Kind regards,

Maybe this example helps?

Yes, thank you, I wanted to use something similar to subsection 4.3. I also like the addition of the qplot. Would you expect a high k-value when the a data point is not on the diagonal line?

If by a data point you mean predictive probability vs loo predictive probability, then yes when their difference is large it’s more likely that corresponding khat is large.