Speed up fitting using y_mean | se(y_se) on mean and SE?

lindeloev · August 13, 2019, 10:24am

I have a big hierarchical model with crossed random effects which takes 3½ days to run on a dataset with ~ 80.000 observations. I need to do it for three variants of the model, and even worse, I will likely need to do it again for double the number of observations. So I’m looking for ways to increase speed.

Since family = gaussian, I tried computing the mean and SEM (the likelihood parameters) for each cell and replacing brm(y ~ ...) with brm(y_mean | se(y_sem) ~ ...). In a toy example below with 100 data points per cell, this perfectly recovered all parameters and increased total sampling speed from ~600 secs to ~ 20 secs (around 20-fold speedup when factoring in effective sample size). Even for small samples of 8 data points per cell, sampling is speeded up from ~ 20 secs to ~ 7 secs (around 2-fold speedup when factoring in effective sample size).

As far as I can see, the second model only differs from the first in that there is no prior on sigma. I have a fairly weak prior on sigma which is totally overwhelmed by data, so the effects of that “omission” should be negligible. Am I missing other differences here or is this a safe approach?

Toy example to study speedup

First make some data:

library(tidyverse)
library(brms)

# Generate data.
# The predictors: 100 trials for each of 5 conditions within each participant
D_raw = expand.grid(trial = 1:100,
                condition = 1:5,
                id = 1:10) %>%
  # Now the response variable with different intercepts per id and condition
  mutate(
    y = rnorm(n(), mean = condition + id, sd = 1),
    condition = factor(condition),
    id = factor(id)
  )

Fit the model on the raw data:

fit_raw = brm(y ~ condition + (1 + condition | id), D_raw)

… and now fit it using the the faster solution:

# Compute likelihoods per cell
D_lik = D_raw %>%
  group_by(id, condition) %>%
  summarise(y_mean = mean(y),
            y_sem = sd(y) / sqrt(n()))

# Fit model on likelihoods
fit_lik = brm(y_mean | se(y_sem) ~ condition + (1 + condition | id), D_lik)

Compare fits

Here is fit_raw:

Population-Level Effects: 
           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept      6.47      1.13     4.26     8.74        713 1.01
condition2     0.93      0.18     0.59     1.29       4074 1.00
condition3     2.28      0.21     1.88     2.69       3026 1.00
condition4     3.02      0.18     2.66     3.37       3549 1.00
condition5     4.06      0.19     3.69     4.43       3870 1.00

… and here is fit_lik:

Population-Level Effects: 
           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept      6.53      1.24     4.15     9.07        611 1.00
condition2     0.92      0.16     0.62     1.25       2749 1.00
condition3     2.27      0.22     1.83     2.69       2364 1.00
condition4     3.05      0.16     2.74     3.39       2412 1.00
condition5     4.07      0.18     3.73     4.44       2495 1.00

paul.buerkner · August 13, 2019, 10:39am

Just one thing after quickly looking at your code. When you just use se() you will not estimate sigma anymore. To make sure, sigma is estimated still, use se(..., sigma = TRUE). Basically, what you are doing is meta-analysis and if its assumptions are justified it is uses all available information for estimating the summary effect size (the effect of condition in your case).

lindeloev · August 13, 2019, 11:01am

Thank you! This is comforting to hear.

Does the meta-regression approach entail more assumptions than what is already “embodied” in the “raw” hierarchical model? For example, if I use non-normal data, add unmodelled correlations, heterogeneity, etc. I would still expect them to yield the same posteriors (I could be wrong).

Just asking to learn how generic this approach is because I’m considering writing up a blog :-)

paul.buerkner · August 13, 2019, 11:21am

As soon as you go beyond a very simple meta-analytic model, things are no longer as easy and I can’t give any general answer to that. I would hope that standard meta-analysis literature/books talks about such issues.

Topic		Replies	Views
Speeding up predictions in brms brms	2	823	March 28, 2020
Gaussian hierarchical factor spline model: increasing efficiency Modeling brms	11	452	May 3, 2024
How to speed up `brms::loo_subsample()` for large models brms loo , hierarchical-model , model-comparison	16	1530	October 25, 2022
Brms sampling in multi-level multinomial models Modeling specification , brms , meta-analysis	8	2064	March 16, 2021
Indicator variables very slow brms	8	825	September 8, 2021

Speed up fitting using y_mean | se(y_se) on mean and SE?

Toy example to study speedup

Compare fits

Related topics