PSA: a case where Stan blows lme4::glmer away

I’ve long been a Bayes booster for a variety of reasons, so it’s been a while since I bothered looking at sheer “speed to achieve results” comparisons. I remembered that lme4::glmer was a particular headache for me in my pre-Bayes days, so I coded up a comparison and WOW!

Here’s a gist containing decently-commented code to simulate and fit data using both lme4 and a highly-optimized Stan model. The code permits you to vary the characteristics of the data, but generally produces a hierarchical within-subjects design completely crossing each of a specified number of two-level predictor variables with binomial outcomes. In the stan code, I use both my reduced-redundant-computation trick as well as the sufficient statistics trick.

For a model with 3 predictor variables, 100 subjects and 100 observations per subject, Stan achieves 1000 decent-quality samples in UNDER A MINUTE. Meanwhile, lme4::glmer is failing to converge even after 30mins of running. (Going to try the newfangled “allFit()” approach to blanket try all the available optimizers)

Possibly this is unexpected to some with deeper intuitions/experience, but amid various claims online that “Bayes is slow” I thought I’d try to publicize this case as a strong counter-example.


This is great but also tingles my “apple vs orange” sense. The optimizer(IRLS or some variant of it, IIRC) for glmer has a totally different condition for convergence, as it’s essentially a Newton-Raphson, which we know could get stuck. Does it make sense to compare against stan’s optimization method?

The point is to compare two commonly-used tools that folks use for inference, using them in the way folks do. Yes it’s apples-to-oranges in a variety of ways; that’s what makes them different tools. As I say at the end, this is intended to counter the claim that Bayes is slow.

1 Like

FYI to anyone coming here in the future:

  1. I’ve subsequently realized that, ironically enough, reduced redundant compute trick is itself redundant when the sufficient stats trick is employed.

  2. There’s an even more performant way to compute when rows_dot_product() is used properly.


@mike-lawrence besides your point, but that’s not to mention the mountain of effort saved in comparison to bootstrapping and other post-processing workarounds for those lme4 models.