Some off pareto values, 99% ok - issue?

negregory14 · May 22, 2022, 2:27pm

So I’m fitting some models, and some of them are spitting out some problematic pareto values in LOOIC:

e.g.:

I understand this is problematic for a variety of reasons, such as making the LOOIC inaccurate / over-optimistic.
However, I am not sure what % of pareto K values is required for this to be problem - i.e. with the 99.3% good pareto values is that acceptable? Or is the model too badly misspecified to make any conclusions? I’m assuming just removing the data points responsible would be inappropriate for any modelling conclusions unless I can justify it?

For context, the number of parameters-fitted is around ~30 , and with the same model but different data points the result is:

Which I understand should be an ok LOOIC estimate

daniel_h · May 22, 2022, 6:23pm

If all you’re interested in is comparing this model to another one, and the difference in elpd between the two models is large I would not worry too much about it.
But it might make sense to use these results as an opportunity to further improve your model: High pareto k values indicate that your leave-one-out posterior is quite different from the posterior obtained from fitting all data points, i.e. the left-out data point is in some way influential.

I would suggest taking a closer look at the points with high pareto k values. Are they somehow unusual? Imagine fitting a normal distribution to a data set with a couple of outliers. In this case, the points that can not be explained with the normal likelihood show up as having large pareto k values. A more appropriate error model (e.g. a student t distribution) would improve the LOO diagnostics.

avehtari · May 30, 2022, 7:38pm

Hi, I was on vacation, but now back answering questions

See CV-FAQ What to do if I have many high Pareto k^'s?. As you have p_loo >> p your model is badly mispecified (some would say that you have some serious outliers).

Even then p_loo>p which is suspicious.

I recommend to look at posterior predictive checking plots, and also otherwise to understand why some of the observations are difficult to predict.

You wre showing just elpd_loo (which I like), but you mention LOOIC couple times. See CV-FAQ How are LOOIC and elpd_loo related? Why LOOIC is -2*elpd_loo?

Topic		Replies	Views
Up-to-date advice on LOO and high diagnostic values Modeling techniques	7	332	May 23, 2024
Bad Pareto k diagnostic with good chain diagnostics General	12	1849	April 26, 2021
Good PP check and R square but large Pareto k values Modeling performance , loo	10	2133	September 2, 2020
A quick note what I infer from p_loo and Pareto k values Modeling loo	35	15765	August 21, 2022
Question about large Pareto k value Modeling loo	6	1117	September 13, 2022

Some off pareto values, 99% ok - issue?

Related topics