I understand this is problematic for a variety of reasons, such as making the LOOIC inaccurate / over-optimistic.
However, I am not sure what % of pareto K values is required for this to be problem - i.e. with the 99.3% good pareto values is that acceptable? Or is the model too badly misspecified to make any conclusions? I’m assuming just removing the data points responsible would be inappropriate for any modelling conclusions unless I can justify it?
For context, the number of parameters-fitted is around ~30 , and with the same model but different data points the result is:
If all you’re interested in is comparing this model to another one, and the difference in elpd between the two models is large I would not worry too much about it.
But it might make sense to use these results as an opportunity to further improve your model: High pareto k values indicate that your leave-one-out posterior is quite different from the posterior obtained from fitting all data points, i.e. the left-out data point is in some way influential.
I would suggest taking a closer look at the points with high pareto k values. Are they somehow unusual? Imagine fitting a normal distribution to a data set with a couple of outliers. In this case, the points that can not be explained with the normal likelihood show up as having large pareto k values. A more appropriate error model (e.g. a student t distribution) would improve the LOO diagnostics.