How to remove observations with a pareto_k > 0.7?

Moutouama · July 19, 2020, 2:59am

Pareto.pdf (5.6 KB) Short summary of the problem

How to remove observations with a pareto_k > 0.7

If possible, add also code to simulate data or attach a (subset of) the dataset you work with.

Please also provide the following information in addition to your question:

Operating System: macOS
brms Version:

• 2.13.0

I used an approximate leave-one-out cross-validation to validate a model and got this warning message.

“Found 3 observations with a pareto_k > 0.7 in model ‘SEM_brms’. It is recommended to set ‘reloo = TRUE’ to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 3 times to compute the ELPDs for the problematic observations directly.”

I then tried to find out the observations that have pareto_k > 0.7. I plotted the loo plot with label equal=TRUE: plot (Criteria_pop$criteria$loo, label_points = TRUE) and plotted the figure attached above (Pareto.pdf). From that figure, it is clear that the observations 23, 34, and 46 are “influential” data points.

My questions is the following.

Is there an automatic way to delete these data? By removing these “influential” data points, I expect to improve the model and estimate the new posterior and see if they differ from the first one, I got with the “influential” data points.
Thanks in advance

paul.buerkner · July 19, 2020, 8:48pm

I would consider this mindset dangerous. If your model does not fit those data points well, the goal should probably be to make a better model to fit the data, rather than making the data fit the model. Of course, as a sensitivity analysis you may want to check what happens if you exclude those data points, but you have to do it manually.

Moutouama · July 19, 2020, 9:33pm

Thanks

Topic		Replies	Views
Extract index of influential observations in loo() - brms brms loo	2	928	June 6, 2019
Improve model with some observations pareto >0.7 brms loo	1	1157	August 18, 2020
Loo_compare in the presence of high pareto-k brms loo	4	300	June 25, 2024
Pareto k values are too high when running model comparison Modeling loo	10	3065	March 11, 2020
Pareto diagnostics for negative binomial versus poisson with variable intercept per observation brms loo	2	406	September 19, 2019

How to remove observations with a pareto_k > 0.7?

Related topics