I often find that MAP estimation (e.g. based on the blme package in R) + posterior simulation via arm::sim() produce posteriors that are very similar to full Stan sampling, at least for simple linear models. For example:
fit.blme <- bglmer(
count ~ zBase * Trt + (1|patient),
data = epilepsy, family = poison()
)
fit.sampling <- brm(
count ~ zBase * Trt + (1|patient),
data = epilepsy, family = poisson(),
backend = "cmdstanr"
)
blme.samples <- arm::sim(fit.blme)
ggplot(as_draws_df(fit.pathfinder)) +
geom_density(data = as_draws_df(fit.sampling), aes(x = `b_zBase:Trt1`, color = "sampling")) +
geom_density(data = as_draws_df(blme.samples@fixef), aes(x = `zBase:Trt1`, color = "blme + sim"))
In comparison, other approximate algorithms like fullrank and pathfinder produce much odder results (it could also be possible I’m not using them properly):
It would be great if this approach could be added to the algorithms for approximate inference in Stan (and, indirectly, in brms).
I would already use blme+sim() as a drop-in solution for brms when exploring model specifications, but blme is much more limited in terms of modelling and prior choices.
This would be trivial if you really did just have a simple linear model and were just applying a standard MAP estimate. That’s all directly supported by Stan.
As is, your example is a hierarchical model, not a simple GLM. Also, blme doesn’t provide a MAP estimate—it provides a penalized maximum likelihood estimate. The problem is that the MAP estimate and the MLE don’t exist for hierarchical models as there’s no maximum to the overall target density (it goes to infinity as hierarchical variance goes to zero and coefficients approach hierarchical mean). So instead, lme4, on which blme is based, uses a technique known as max marginal likelihood. That means they marginalize out the regression coefficients and optimize the rest marginally, then plug in those estimates to optimize the regression coefficients. This is often called “empirical Bayes” in the frequentist literature (it’s not a Bayesian estimator as it doesn’t average over uncertainty). In general, blme isn’t providing a MAP estimate because it doesn’t take into account the Jacobians for constrained parameters like, say, the scale of error in a linear regression. Stan just recently introduced a proper MAP estimator, but it won’t work for hierarchical models for which the MAP estimates don’t exist.
Typically, the variational inference algorithms will produce underdispersned approximations due to the direction of the KL-divergence being optimized (KL[q || p], where q is the approximation and p is the target). We developed Pathfinder primarily as an initialization scheme for sampling.
I don’t know that either rstanarm or brms are being actively developed any more. Those are both R-only packages, where it would be possible to introduce a dependency on blme. I’m pretty sure that @andrewgelman gave up maintaining blme because lme4 kept changing.
Everything else in Stan is cross-platform, so it wouldn’t be able to depend on an R-only package like blme (and rewriting it would be a huge endeavor that would only help with a limited class of hierarchical GLMs).
1 Like
Hi, in answer to Bob, yes, rstanarm and brms continue to be actively developed. Also, blme is maintained by Vince Dorie.
One hack that has sometimes worked for me is to set the group-level variance parameters to fixed values (perhaps based on a long Stan run or from experience with previous models) and then you can run Stan or rstanarm on optimize setting with these variances fixed. Once the variance parameters are fixed, there’s no funnel and the optimization should be fine.
If you want you can then go further and run this optimizer in parallel using different fixed choices of the hierarchical variance parameters. It should be easy enough to do this 16 times or whatever, and then combine the inferences using stacking. I’ve never tried this approach but I think it could work.
1 Like
Thank you Bob for taking the time to provide the technical details.
Just to be clear, I wasn’t proposing the implementation of such a technique (empirical Bayes + posterior simulation as done with sim()) into brms or rstanarm, but directly into Stan, exactly for cross-language compatibility.
The point, I guess, reduces to whether such a technique provides a good enough posterior approximation, at least in comparison to existing variational techniques already implemented in Stan. Also, blame has a very limited choice of priors and model structure flexibility compared to stan (and consequently brms).
Do you think it could provide added value?