Brms hacking: linear predictors for random effect standard deviations

Hi all,
I have a new blog showcasing the immense hackability of brms by extending a random intercept model with linear predictors on the standard deviation of the random intercept. Should you do it? Most likely not, but if you really really want, there is a way. Also the techniques shown are general and let you do a lot of other crazy stuff with brms

https://www.martinmodrak.cz/2024/02/17/brms-hacking-linear-predictors-for-random-effect-standard-deviations/

Happy for any feedback!

4 Likes

I believe you linked to an older post. Is this the post you meant to link?

I recently looked into this and there’s an easier way without overwriting the likelihood. It depends on the fact that brms models random effects with a non-centralized parametrization. E.g. if this is our model:

y ~ 1 + (1|ID)

the underlying model is

y = b_0 + eps_resid
b_0 ~ a_0 + sigma_id * Norm(0, 1)
eps_resid ~ Norm(0,sigma)

then you can use non-linear formulas to:

  • fix sigma_id to 1
  • multiply by a new parameter mysigma, which can then be predicted by any linear formula.

For example.

Setup

library(cmdstanr)
library(brms)
library(tidyverse)
library(knitr)
library(bayesplot)
options(mc.cores = parallel::detectCores())

dat <- rtdists::speed_acc %>% 
  filter(block %in% c(1,2,3,4,5))

Basic model

f1 <- bf(rt ~ stim_cat + (1|id))
fit1 <- brm(f1, data = dat, chains = 4)

Custom model to predict random effect sigma

# non-linear formula
f2 <- bf(rt ~ murt + idintercept * exp(logmysigma),  # specify explicit terms in a non-linear formula
         murt ~ stim_cat,                            # predict rt mean via murt
         idintercept ~ 1 + (1|id),                   # specify random effects (will fix via prior)
         logmysigma ~ response,                      # model random effects variance (response is a categorical predictor here, but can be anything)
         nl = T)

# set priors to fix parameters, and to avoid flat priors on nlp parameters
prior2 <- prior(student_t(3,0.6,2.5), coef='Intercept', nlpar="murt") + 
  prior(constant(0), class='b', nlpar='idintercept') +               # center random effects at 0
  prior(constant(1), class='sd', nlpar='idintercept') +              # fix random effects sd to 1
  prior(student_t(3,0,2.5), class='b', nlpar="logmysigma")           # priors on our sigma


fit2 <- brm(f2, data = dat, chains = 4, iter = 4000, prior=prior2)

You get the following output:

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: rt ~ murt + idintercept * exp(logmysigma) 
         murt ~ stim_cat
         idintercept ~ 1 + (1 | id)
         logmysigma ~ response
   Data: dat (Number of observations: 8160) 
  Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup draws = 8000

Multilevel Hyperparameters:
~id (Number of levels: 17) 
                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(idintercept_Intercept)     1.00      0.00     1.00     1.00   NA       NA       NA

Regression Coefficients:
                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
murt_Intercept                 0.61      0.02     0.57     0.65 1.00      791     1831
murt_stim_catnonword           0.03      0.01     0.01     0.05 1.00     1742     3484
logmysigma_Intercept          -2.52      0.21    -2.90    -2.09 1.00      950     1847
logmysigma_responsenonword     0.29      0.08     0.12     0.45 1.00     3830     5070
logmysigma_responseerror      -1.73      2.88    -9.56     1.37 1.00     2638     1289
idintercept_Intercept          0.00      0.00     0.00     0.00   NA       NA       NA

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.30      0.00     0.30     0.31 1.00     4650     4822

In this model, the idintercept predictor is a random effects intercept with mean 0 and sd 1. We multiply it by exp(logmysigma), thus results in exp(logmysigma) as the standard deviation of the random intercept. From the results we see that indeed the random effect variance differs between conditions. You can specify any formula on logmysigma.

If I understood your post correctly, this achieves the same. Still hacky, but it does not require custom likelihoods and works for any model.

4 Likes

Thanks for pointing out the bad link - fixed.

I evaluated this option earlier (as discussed in the blog), but managed to convince myself that the non-linear formula approach is a different model. But it appears I was wrong and you are right, I am able to replicate the example model from the blog with a pure brms approach:

data_joined <- base_data %>% 
  inner_join(trt_data %>% mutate(trt_id = 1:n()), by = "trt_id")

# Fix the sd to 1
prior_joined <- prior(constant(1), class='sd', nlpar='patientintercept')


fit_joined <- brm(
     # specify explicit terms in a non-linear formula
  bf(y ~ muy + patientintercept * exp(logmysigma), 
         # predict y with x
         muy ~ x,                                  
         # specify the random intercept
         patientintercept ~ 0 + (1|patient_id),    
         # model random effects variance
         logmysigma ~ trt_x,                       
         nl = T),
  prior = prior_joined,
  data = data_joined
)

summary(fit_joined)

EDIT: I also think your approach is not dependent on brms using non-centered parametrization. Fixing sd to 1 would also work with centered parametrization.

3 Likes

Yes, it’s not dependent on it, it’s just what gave me this idea in the first place a few months ago when I was trying to figure out how to do this

1 Like

Anyway, huge thanks for pointing this out!

1 Like

No worries! I was planning at some point to also write a post about this, but never found the time 😅

I also agree with what you wrote in the post about brms being extremely hackable, which is super useful. We are currently build a package for cognitive modeling that uses brms as an interface, and such hacks turned out to be super useful. For example, for mixture models brms needs to have the same number of mixture distributions for each condition, but with some nonlinear formulas you can turn mixture components on and off based on variables in the data, etc.

So it’s always nice to see other ways to do things like that for inspiration 🙂 nowadays I rarely write models in Stan myself, just using custom families and some hacks in brms to take advantage of the formula syntax

3 Likes