PSA: a case where Stan blows lme4::glmer away

mike-lawrence · January 5, 2021, 6:33pm

I’ve long been a Bayes booster for a variety of reasons, so it’s been a while since I bothered looking at sheer “speed to achieve results” comparisons. I remembered that lme4::glmer was a particular headache for me in my pre-Bayes days, so I coded up a comparison and WOW!

Here’s a gist containing decently-commented code to simulate and fit data using both lme4 and a highly-optimized Stan model. The code permits you to vary the characteristics of the data, but generally produces a hierarchical within-subjects design completely crossing each of a specified number of two-level predictor variables with binomial outcomes. In the stan code, I use both my reduced-redundant-computation trick as well as the sufficient statistics trick.

For a model with 3 predictor variables, 100 subjects and 100 observations per subject, Stan achieves 1000 decent-quality samples in UNDER A MINUTE. Meanwhile, lme4::glmer is failing to converge even after 30mins of running. (Going to try the newfangled “allFit()” approach to blanket try all the available optimizers)

Possibly this is unexpected to some with deeper intuitions/experience, but amid various claims online that “Bayes is slow” I thought I’d try to publicize this case as a strong counter-example.

yizhang · January 5, 2021, 7:31pm

This is great but also tingles my “apple vs orange” sense. The optimizer(IRLS or some variant of it, IIRC) for glmer has a totally different condition for convergence, as it’s essentially a Newton-Raphson, which we know could get stuck. Does it make sense to compare against stan’s optimization method?

mike-lawrence · January 5, 2021, 7:35pm

The point is to compare two commonly-used tools that folks use for inference, using them in the way folks do. Yes it’s apples-to-oranges in a variety of ways; that’s what makes them different tools. As I say at the end, this is intended to counter the claim that Bayes is slow.

mike-lawrence · June 19, 2021, 10:18pm

FYI to anyone coming here in the future:

I’ve subsequently realized that, ironically enough, reduced redundant compute trick is itself redundant when the sufficient stats trick is employed.
There’s an even more performant way to compute when rows_dot_product() is used properly.

AWoodward · June 20, 2021, 12:57pm

@mike-lawrence besides your point, but that’s not to mention the mountain of effort saved in comparison to bootstrapping and other post-processing workarounds for those lme4 models.

Topic		Replies	Views
[Case-study preview] Speeding up Stan by reducing redundant computation Publicity performance	8	2119	June 6, 2020
Have you been using some of the latest features of Stan? General	14	2579	November 12, 2021
Tutorial on Monte Carlo EM and variants for MML and MMAP Algorithms	16	3727	October 22, 2018
Comparing implementations of mixed logit Bayesian inference General	3	1041	July 14, 2020
Excellent paper on Comparing JAGS, NIMBLE, and Stan Publicity	23	6950	May 3, 2024

PSA: a case where Stan blows lme4::glmer away

Related topics