I have a big hierarchical model with crossed random effects which takes 3½ days to run on a dataset with ~ 80.000 observations. I need to do it for three variants of the model, and even worse, I will likely need to do it again for double the number of observations. So I’m looking for ways to increase speed.

Since `family = gaussian`

, I tried computing the mean and SEM (the likelihood parameters) for each cell and replacing `brm(y ~ ...)`

with `brm(y_mean | se(y_sem) ~ ...)`

. In a toy example below with 100 data points per cell, this perfectly recovered all parameters and increased total sampling speed from ~600 secs to ~ 20 secs (around 20-fold speedup when factoring in effective sample size). Even for small samples of 8 data points per cell, sampling is speeded up from ~ 20 secs to ~ 7 secs (around 2-fold speedup when factoring in effective sample size).

As far as I can see, the second model only differs from the first in that there is no prior on `sigma`

. I have a fairly weak prior on `sigma`

which is totally overwhelmed by data, so the effects of that “omission” should be negligible. Am I missing other differences here or is this a safe approach?

# Toy example to study speedup

First make some data:

```
library(tidyverse)
library(brms)
# Generate data.
# The predictors: 100 trials for each of 5 conditions within each participant
D_raw = expand.grid(trial = 1:100,
condition = 1:5,
id = 1:10) %>%
# Now the response variable with different intercepts per id and condition
mutate(
y = rnorm(n(), mean = condition + id, sd = 1),
condition = factor(condition),
id = factor(id)
)
```

Fit the model on the raw data:

```
fit_raw = brm(y ~ condition + (1 + condition | id), D_raw)
```

… and now fit it using the the faster solution:

```
# Compute likelihoods per cell
D_lik = D_raw %>%
group_by(id, condition) %>%
summarise(y_mean = mean(y),
y_sem = sd(y) / sqrt(n()))
# Fit model on likelihoods
fit_lik = brm(y_mean | se(y_sem) ~ condition + (1 + condition | id), D_lik)
```

# Compare fits

Here is `fit_raw`

:

```
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept 6.47 1.13 4.26 8.74 713 1.01
condition2 0.93 0.18 0.59 1.29 4074 1.00
condition3 2.28 0.21 1.88 2.69 3026 1.00
condition4 3.02 0.18 2.66 3.37 3549 1.00
condition5 4.06 0.19 3.69 4.43 3870 1.00
```

… and here is `fit_lik`

:

```
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept 6.53 1.24 4.15 9.07 611 1.00
condition2 0.92 0.16 0.62 1.25 2749 1.00
condition3 2.27 0.22 1.83 2.69 2364 1.00
condition4 3.05 0.16 2.74 3.39 2412 1.00
condition5 4.07 0.18 3.73 4.44 2495 1.00
```