A bit of background. In my field a very important question is whether our samples (of languages) are representative or biased. One thing I am trying to explore is how well different types of models (hierarchical, phylogenetic, gps for spacial correlations, etc. ) can cope with samples which are unbiased and samples which are known to be biased (because I biased them).

One thing that I am observing is that if I fit a model on a non-biased sample, and calculate the pareto k diagnostics using loo, I get that all is good (< 0.7). However, if I fit the same model to a dataset which I biased, the new model will have several problematic observations.
The way I am biasing my samples is by simply duplicating some observations according to different criteria.

My questions are: how can I interpret this result? is this expected? Can I argue that if Model 1 produces fewer high pareto k diagnostics than Model 2, Model 1 is doing a better job at coping with the bias than model 2?

CV FAQ answer 16 refers LOO Glossary which has related discussion on khat values. In your case, it seems that higher khat values are due to the model 2 being misspecified for biased date.

Thanks for your answer Aki. Maybe a different way of asking my questions:

can I use high khat values to identify biased datasets if I ‘know’ I have a model which is correctly specified for unbiased datasets?

can I interpret the difference in number of high khat values? For example, model 1 has 20 while model 2 has 60. Does this mean model 2 has a worse specification than model 1 for the data in question?

High khat is telling that the corresponding observations are highly influential for some parameters, but it’s not directly or always the same as model misspecification. You can have model misspecification also when non of the khats is very high. I would use also individual elpd values to indicate difficult to predict observations, posterior predictive checking to see the difference in the modelled and observed distributions.

No, because “High khat is telling that the corresponding observations are highly influential for some parameters, but it’s not directly or always the same as model misspecification.” You can use elpd for the numerical comparison.