How to identify which datapoints have the most influence on the posterior?

I’m just working with my first Stan model and I have a question I have not been able to find an answer to. I want to know which of my datapoints have had the most influence/leverage on the posterior. Whenever I find some discussion that touches on the topic of identifying points with high leverage, the focus always seems to be on comparative module evaluation. But I am not interested in evaluating the out-of-sample predictive performance of my model, as I am not using it for making predictions. I just want to introspect the model to understand it.

I guess a naive strategy would to use full-blown LOO-CV to see which datapoints when omitted cause the mean of the posterior to change the most. But I was hoping for a less computationally expensive approach. I have tried using the loo function from Arviz, but the results are not what I hoped for. All of the points have a very low k value (< 0.05), and the ones that the highest values are unstable and change from Stan run to run. I’m not necessarily looking for true outliers that would make this an unreliable model, I just want to know what parts of the data have most leverage on the posterior.

My current tactic is to compute which of the observed datapoints are most improbable according to the full posterior. That is, I take the absolute value of subtracting the observed value for each datapoint from its mean predicted value. This procedure seems to produce sensible results, but I have no idea if it is mathematically justified. My intuitive reason for it is that the most improbable datapoints are the ones that the model will have tried the hardest to accommodate, even while inevitably failing to fit them well. Does that make sense?

Before I go ahead and use LOO-CV to re-fit my model 1,000 times, can anyone suggest a more efficient and mathematically justified procedure?



There are different influence measures, and for example leverage has a specific definition Leverage (statistics) - Wikipedia. If you want to compute leverage, then compute that as defined.

The definition of leverage Leverage (statistics) - Wikipedia has only one model.

Even if you are not using it for making predictions, it can be helpful to know the predictive performance. What would you expect to learn from introspecting a model that is not predicting better than a random guess?

Pareto-k measures how much the whole posterior changes when one observation is left out, so you can use it even if you don’t care about the predictive performance. You can improve stability by running more chains or longer chains (or combine the results from the many runs you already did).

That is mean absolute error (MAE) of the point prediction, and is also a measure of predictive performance. It makes sense if that is the relevant part of the model you want to investigate.

You can use PSIS-LOO computed importance weights to investigate any derived quantities that depend on the posterior and data and how those change if one observation is removed. That way, you can focus to look at the influence of one observation to the specific quantity of interest.

1 Like