High-dimensional {brms} models - formula stack overflow

Hi Stan forum,

I’m trying to build high-dimensional brms models and am coming up against a stack overflow error when the number of predictors is higher than ~ 4,100 (simple reprex below). This seems to be an issue with the definition of the formula (traceback below).

It’s worth noting that neither fiddling with options(expression) nor starting R with a larger protection stack seems to solve the issue. An obvious solution is to code the model in Stan itself; however, this makes compatibility with downstream packages more troublesome (e.g. projpred, given that its default get_refmodel methods work only for stanreg or brmsfit objects). Using rstanarm::stan_glm doesn’t seem to have the same issue on this toy example, but then rstanarm doesn’t have support for as many families. Are there any potential workarounds within brms @paul.buerkner?

n <- 100
p <- 4200

df_sim <- as.data.frame(matrix(rnorm(n * p), ncol = p))
colnames(df_sim) <- paste0("x.", 1:p)
df_sim$y <- rbinom(n, 1, 0.5)

form <- paste0("y ~ ", paste0("x.", 1:p, collapse = "+"))

model <- brm(
  formula = form
  ,data = df_sim
  ,family = bernoulli()
)

Error: protect(): protection stack overflow
5: terms.formula(formula, …)
4: stats::terms(formula, …)
3: terms(all_vars_formula)
2: validate_data(data, bterms = bterms, data2 = data2, knots = knots,
drop_unused_levels = drop_unused_levels, data_name = substitute_name(data))
1: brm(formula = form, data = df_sim, family = bernoulli())

# For reference, this runs OK
model <- rstanarm::stan_glm(
  formula = form
  ,data = df_sim
  ,family = binomial(link = "logit")
)
  • Operating System: Ubuntu 24.04.2 LTS (running in WSL2)
  • brms Version: 2.22.0

Thanks in advance!