In this talk, Bob mentions that “a lot of models, like hierarchical models, don’t have posterior modes” making MAP/Penalized MLE a poor choice for inference.

Does anyone happen to have a citation for this claim that hierarchical models don’t have modes (or perhaps are multimodal, whatever Bob meant by this claim)?

1 Like

It’s not so much that the posterior mode doesn’t exist but rather that it exists on a *boundary* of the model configuration space. This behavior invalidates most of the Bernstein-von Mises asymptotic guarantees of maximum a posteriori estimates, and hence a common motivation for these types of estimates.

This is straightforward to see if you look at a normal hierarchal model with “flat” prior density functions for the population location and scale

\begin{align*}
\pi(\theta_{k}, \mu, \tau)
&= \pi(\theta_{k} \mid \mu, \tau) \, \pi(\mu, \tau)
\\
&\propto
\text{normal}( \theta_{k} \mid \mu, \tau).
\end{align*}

In this case \pi(\theta_{k}, \mu, \tau) has a singular maximum as \tau \rightarrow 0 and \theta_{k} - \mu \rightarrow 0. Geometrically it’s the very bottom of the infamous funnel.

The full posterior, however, also has to take into account the likelihood functions \pi(\tilde{y}_{n} \mid \theta_{k(n)}). The problem is that unless there a lot of data *and* a reasonable number of groups then the likelihood function won’t be able to exclude that singular mode of the hierarchical model, which will then propagate to the posterior density function.

A prior model for \tau that explicitly excludes \tau = 0 will yield a better behaved maximum, but that’s feasible only when one actually has domain expertise that excludes homogeneous behavior amongst the groups. Even then asymptotics are often so far away that the performance of the maximum a posteriori estimator will not be great even though it has been moved away from the boundary.

Ideally one would confirm this by trying to run an optimization in Stan or similar tool and seeing the boundary behavior.

2 Likes