A quick note what I infer from p_loo and Pareto k values

avehtari · March 3, 2018, 8:26pm

There are infinitely many different hierarchical models, so I don’t have a generic answer which would work always.

For most hierarchical models and high Pareto k values cases except for the last point about flexible well models, you can start by counting all parameters in the parameters block, and then refine if necessary.

For the more refined flexibility or weak prior case, different parameters can be influenced by different sets of observations directly from the likelihood or indirectly through joint prior. This would really require a longer explanation, but here are two short examples (quickly written in sloppy language):

Simple hierarchical model with individuals, groups, and prior for groups. If prior for group parameters is weak, and some group has only one or a few individuals per local parameter, then each individual is highly influential and posterior with or without one observations can be so different that Pareto k will be high.
In Gaussian latent variable models (e.g. GP) each observation has latent variable and the total number of parameters (latent parameters plus prior parameters) p>n. If (e.g. GP with exp_quad covf) prior is weak (short lengthscale), and specific observation is far away from the others then corresponding latent value is influenced mostly by that observation. On the other hand group of latent values close to each other correlate more and get information from other observations, too. Thus in some part of the covariate space loo may work well and in some other parts it may fail and we observe high k values.

If the above makes any sense, I’ll write it later with less sloppy language, and if it doesn’t make any sense I’ll try with more examples.

I was thinking three different models. Which model would you like to see? If you have a model with high Pareto k values, I can analyse it and add it to the case study.

Topic		Replies	Views
Good PP check and R square but large Pareto k values Modeling performance , loo	10	2208	September 2, 2020
Pareto K diagnostics and kfold model comparison brms loo	15	5082	February 13, 2019
Problems in model comparsion with loo-package for a self-written stan model with explanatory variables and hierarchy Modeling specification , loo	6	499	September 2, 2020
Recommendations for what to do when k exceeds 0.5 in the loo package? Modeling loo	21	7626	March 8, 2018
Bad Pareto k diagnostic with good chain diagnostics General	12	1973	April 26, 2021

A quick note what I infer from p_loo and Pareto k values

Related topics