I’ve been learning about hierarchical models with partial pooling. In my experiments so far, the “global” or group-level parameters generally have always yielded a smaller effective sample size than the “local” or subject/lower-down-the-hierarchy parameters. (Using “global” and “local” in the same loose way that Betancourt and Girolami do here.)
For example, in the biased inference case study,
n_eff for the global parameters (
tau) is always less than or equal to
n_eff for the local parameters (
theta), regardless of the parameterization.
I’m wondering: is this a general rule? Why is it happening here (and perhaps in general)? My understanding is that the sample autocorrelation must be higher for the parameters that have smaller ESS, but I’m curious as to why.
As an initial thought I’d offer that parameters being autocorrelated implies their higher-level priors (tau and mu) will be as well, but the converse is not true. In other words exchangeability allows autocorrelation to go from parameter to hyperparameter but not vice-versa.
I don’t think that there’s too much insight to be gained here, but in general the more a parameter is coupled coupled to other parameters the more slowly its relevant values will be explored by a sampler, the larger the autocorrelations will be, and the smaller the effective sample size will be. The population (“global”) parameters in a hierarchical model are directly coupled to each other and all of the individual (“local”) parameters, but each local parameter is directly coupled only to the population parameters and not to other individual parameters. This is all an oversimplified cartoon (indirect couplings matter as does the actual strength of each coupling) but it provides some intuition for that common behavior.