Hierarchical models... asymptotically unbiased?

General summary:

  • Proper model specification (i.e. controlling for confounds, not using a “causal salad”) is good because it (hopefully) leads to unbiased parameter estimates (which is good for causal interpretations).
  • Hierarchical models are good because they bias the parameter estimates to prevent overfitting (shrinkage), but large sample sizes reduce this shrinkage
  • Hierarchical models are therefore asymptotically unbiased as the sample size increases (as far as I understand) assuming the model is a reasonable depiction of reality.
  • If the goal is to interpret the parameters as causal then adding bias is bad…right?


  1. Are hierarchical models good for data exploration in small-medium datasets? It seems like small effect sizes and small sample sizes would lead to large bias in the estimates.

  2. Are there any general rules for using hierarchical models for confirmatory studies and non-hierarchical models for exploratory studies?

  3. If there’s a large amount of shrinkage is that just an indicator that we’re basing our conclusions on dangerously low information?

Thanks very much.


Hi @Brendan_Alexander, and welcome! In general, stats questions like this one, with no stan-specific component, are probably better suited for different forum, such as stats.stackexchange.com.

Very briefly, there are plenty of scenarios where some regularization of parameter estimates is desirable, especially in low-information settings, and regularization via hierarchical priors that include normally distributed random effects is often a very useful way to achieve this.

If the true generative process is well approximated by this random effect construction, then the resulting parameter estimates will be unbiased. In settings where the generative model is not well approximated by a model with a normally distributed random effect, then the random effects model is at least somewhat misspecified, and this could result in biased parameter estimates. Note however, that even in the nonhierarchical “fixed effects” case, you will need to make a choice about how strongly to regularize the coefficient estimates via the prior. In low-information settings, it is essentially–maybe literally–always the case that some degree of regularization is sensible.

1 Like

Probably an important point is that random effects models bias low level parameters at least partly to a) improve their accuracy and b) reduce bias in higher level parameters.

1 Like

If you are interested in the average error for several effect estimates (e.g. for subgroups in a population) rhis error will actually be smaller with a hierarchical model. See James-Stein estimator.

1 Like

Thanks for the replies everyone, and apologies, next time I’ll post over a stack exchange since this question was off topic.

My motivation for the question was thinking about multiple comparisons for RCBD trials in agriculture. When comparing treatments it is the norm (in my experience) to do all pairwise contrasts and then correct for multiple comparisons using something like Tukey’s HSD. on the backend.

This led me down a rabbit hole of asking the question “How would I do such a thing in Bayesian statistics?” and this led me to hierarchical modeling as suggested by Andrew Gelman and Richard McElreath.
Richard McElreath makes the argument on page 419 (statistical rethinking 2nd ed) that this technique can apply to effects that would otherwise have been considered “fixed” (assuming I understood it correctly).

While I like the idea of hierarchical modeling, my guess is that there will be a reviewer somewhere down the line that says the estimates are biased towards to grand mean and therefore frequentist methods a la proc mixed, or at least modeling the treatments as fixed (no pooling), would be better since the estimates would not be biased.

Additionally, we often have low-power in agricultural fields studies: lots of environmental noise, low to moderate effect sizes, low replication (plots are expensive to run, we usually have 3-6 replicates).
Obviously we’d like to report interesting findings, but being fooled by random noise is definitely a real problem here, as 1 outlier for a treatment can have a large effect on the estimate for that treatment in no-pooling scenarios (as @jsocolar mentioned).
Actually, I think that power analyses aren’t even done in general because it’s so expensive to increase replication even though it would be incredibly helpful to have thought about biologically relevant effect sizes before initiating the experiment.

So far in my simulations the shrinkage doesn’t seem to be that bad…
As @Guido_Biele mentioned it does seem like the average error for the parameter estimates is lower partially pooled estimates than for non-pooled estimates. Richard McElreath mentioned this too, I guess I just didn’t really appreciate the lesson on the first read.

Thanks again for the answers everyone.