I hadn’t quite understood the density the funnel example (as described in the user’s guide chapter on reparameterization). Or of the full implications of using a lognormal as Radford Neal recommends as opposed to the half normal that Andrew Gelman recommends.
To keep things simple, I’ll use a slightly modified version without worrying about constant choices—the geometry’s the same. There are two models in play. I’ll start with Radford’s version then move on to Andrew’s.
Neal’s funnel
Radford’s original funnel (with simplified constants) is
The reason we care about this is that it’s a hierarchical prior for a regression model, where there would be a group of coefficients \beta_1, \ldots \beta_K; we’ve just simplified here to a single coefficient, taking K = 1, as we only care about the marginal prior of \tau and one of the \beta_k.
An alternative way of writing this without an explicit transform is
The lognormal avoids zero in that
The convergence is fast, as we’ll see in the plots below.
Gelman’s funnel
If we follow Andrew’s latest recomendations for hierarchical modeling, we stay away from zero-avoiding priors and code something like this,
where I’ve used Stan-like notation to indicate the half-normal prior on \sigma that is consistent with zero, in that
Unconstrained funnels
Neal’s funnel is naturally on the unconstrained scale p(\tau, \beta), with support \tau, \beta \in \mathbb{R}. That’s because \tau = \log \sigma.
Gelman’s funnel on the other hand, needs to be transformed, with \tau = \log \sigma and inversely, \sigma = \exp(\tau), so that the Jacobian adjustment is
resulting in the unconstrained density
Modes?
I had naively thought that neither version of the funnel had a mode. It turns out that Neal’s funnel, because of the zero avoiding behavior, really does have a mode.
If we have a mode (\tau^*, \beta^*), it must have \beta^* = 0, because \beta \sim \mbox{normal}(0, \exp(\tau)). So let’s look at the functions p(\tau, 1) for Gelman’s and Neal’s funnels.
Neal’s funnel mode
Neal’s funnel actually has a mode. Here’s a plot with \beta = 0 fixed,
Plot of Neal’s funnel p(\log \sigma, \beta = 0). There is a clear mode around -1.
Hmm. I thought it grew without bound, but that lognormal prior on \sigma is really strong!
Gelman funnel mode
Surely without the zero-avoiding prior, there’s no limit to the log density of the hierarchical prior, right? I may even be recorded saying so.
I take it all back. Here’s Gelman’s, where \log \sigma = \tau.
Plot of p(\log \sigma, \beta = 0) for Gelman’s funnel. It asymptotes a little beyond \log \sigma = -5, or around \sigma < 0.005.
The plot actually looks almost identical on the log scale the dropoff is so sharp. The Jacobian \exp(\tau) is keeping things balanced (you can work out the limits analytically pretty easily now that I wrote out the density with the Jacobian adjustment).
So the unbounded growth is only a problem for penalized maximum likelihood estimates. When we take a max a posteriori (MAP) estimate, we get the Jacobian term, which cancels the unbounded growth in \textrm{normal}(\beta \mid 0, \sigma) as \beta \rightarrow 0, \sigma \rightarrow 0.
We get a proper MAP estimate for Neal’s funnel, but we only get an asymptote with Gelman’s funnel.
Methodology going forward
Should any of this make us rethink the priors we recommend? Look how much more nicely behaved that density is in the Neal formulation. Wouldn’t that be sooo much easier to sample from using NUTS?
Maybe we can recommend these zero-avoiding priors and then recommend that users look for pileups of probability mass and/or divergence issues as \sigma \rightarrow 0 (that is, \log \sigma \rightarrow -\infty.)
Among others, I’d love to hear from @betanalpha, @andrewgelman, @anon75146577, @avehtari, @bgoodri, and @yuling.