I have a big hierarchical model with crossed random effects which takes 3½ days to run on a dataset with ~ 80.000 observations. I need to do it for three variants of the model, and even worse, I will likely need to do it again for double the number of observations. So I’m looking for ways to increase speed.
family = gaussian, I tried computing the mean and SEM (the likelihood parameters) for each cell and replacing
brm(y ~ ...) with
brm(y_mean | se(y_sem) ~ ...). In a toy example below with 100 data points per cell, this perfectly recovered all parameters and increased total sampling speed from ~600 secs to ~ 20 secs (around 20-fold speedup when factoring in effective sample size). Even for small samples of 8 data points per cell, sampling is speeded up from ~ 20 secs to ~ 7 secs (around 2-fold speedup when factoring in effective sample size).
As far as I can see, the second model only differs from the first in that there is no prior on
sigma. I have a fairly weak prior on
sigma which is totally overwhelmed by data, so the effects of that “omission” should be negligible. Am I missing other differences here or is this a safe approach?
Toy example to study speedup
First make some data:
library(tidyverse) library(brms) # Generate data. # The predictors: 100 trials for each of 5 conditions within each participant D_raw = expand.grid(trial = 1:100, condition = 1:5, id = 1:10) %>% # Now the response variable with different intercepts per id and condition mutate( y = rnorm(n(), mean = condition + id, sd = 1), condition = factor(condition), id = factor(id) )
Fit the model on the raw data:
fit_raw = brm(y ~ condition + (1 + condition | id), D_raw)
… and now fit it using the the faster solution:
# Compute likelihoods per cell D_lik = D_raw %>% group_by(id, condition) %>% summarise(y_mean = mean(y), y_sem = sd(y) / sqrt(n())) # Fit model on likelihoods fit_lik = brm(y_mean | se(y_sem) ~ condition + (1 + condition | id), D_lik)
Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 6.47 1.13 4.26 8.74 713 1.01 condition2 0.93 0.18 0.59 1.29 4074 1.00 condition3 2.28 0.21 1.88 2.69 3026 1.00 condition4 3.02 0.18 2.66 3.37 3549 1.00 condition5 4.06 0.19 3.69 4.43 3870 1.00
… and here is
Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 6.53 1.24 4.15 9.07 611 1.00 condition2 0.92 0.16 0.62 1.25 2749 1.00 condition3 2.27 0.22 1.83 2.69 2364 1.00 condition4 3.05 0.16 2.74 3.39 2412 1.00 condition5 4.07 0.18 3.73 4.44 2495 1.00