How to remove observations with a pareto_k > 0.7?

Pareto.pdf (5.6 KB) Short summary of the problem

How to remove observations with a pareto_k > 0.7

If possible, add also code to simulate data or attach a (subset of) the dataset you work with.

Please also provide the following information in addition to your question:

  • Operating System: macOS
  • brms Version:

• 2.13.0

I used an approximate leave-one-out cross-validation to validate a model and got this warning message.

“Found 3 observations with a pareto_k > 0.7 in model ‘SEM_brms’. It is recommended to set ‘reloo = TRUE’ to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 3 times to compute the ELPDs for the problematic observations directly.”

I then tried to find out the observations that have pareto_k > 0.7. I plotted the loo plot with label equal=TRUE: plot (Criteria_pop$criteria$loo, label_points = TRUE) and plotted the figure attached above (Pareto.pdf). From that figure, it is clear that the observations 23, 34, and 46 are “influential” data points.

My questions is the following.

Is there an automatic way to delete these data? By removing these “influential” data points, I expect to improve the model and estimate the new posterior and see if they differ from the first one, I got with the “influential” data points.
Thanks in advance

I would consider this mindset dangerous. If your model does not fit those data points well, the goal should probably be to make a better model to fit the data, rather than making the data fit the model. Of course, as a sensitivity analysis you may want to check what happens if you exclude those data points, but you have to do it manually.

1 Like