A quick note what I infer from p_loo and Pareto k values

There are infinitely many different hierarchical models, so I don’t have a generic answer which would work always.

For most hierarchical models and high Pareto k values cases except for the last point about flexible well models, you can start by counting all parameters in the parameters block, and then refine if necessary.

For the more refined flexibility or weak prior case, different parameters can be influenced by different sets of observations directly from the likelihood or indirectly through joint prior. This would really require a longer explanation, but here are two short examples (quickly written in sloppy language):

  1. Simple hierarchical model with individuals, groups, and prior for groups. If prior for group parameters is weak, and some group has only one or a few individuals per local parameter, then each individual is highly influential and posterior with or without one observations can be so different that Pareto k will be high.
  2. In Gaussian latent variable models (e.g. GP) each observation has latent variable and the total number of parameters (latent parameters plus prior parameters) p>n. If (e.g. GP with exp_quad covf) prior is weak (short lengthscale), and specific observation is far away from the others then corresponding latent value is influenced mostly by that observation. On the other hand group of latent values close to each other correlate more and get information from other observations, too. Thus in some part of the covariate space loo may work well and in some other parts it may fail and we observe high k values.

If the above makes any sense, I’ll write it later with less sloppy language, and if it doesn’t make any sense I’ll try with more examples.

I was thinking three different models. Which model would you like to see? If you have a model with high Pareto k values, I can analyse it and add it to the case study.

3 Likes