Hi Stan forum,
I’m trying to build high-dimensional brms
models and am coming up against a stack overflow error when the number of predictors is higher than ~ 4,100 (simple reprex below). This seems to be an issue with the definition of the formula (traceback below).
It’s worth noting that neither fiddling with options(expression)
nor starting R with a larger protection stack seems to solve the issue. An obvious solution is to code the model in Stan itself; however, this makes compatibility with downstream packages more troublesome (e.g. projpred
, given that its default get_refmodel
methods work only for stanreg
or brmsfit
objects). Using rstanarm::stan_glm
doesn’t seem to have the same issue on this toy example, but then rstanarm
doesn’t have support for as many families. Are there any potential workarounds within brms
@paul.buerkner?
n <- 100
p <- 4200
df_sim <- as.data.frame(matrix(rnorm(n * p), ncol = p))
colnames(df_sim) <- paste0("x.", 1:p)
df_sim$y <- rbinom(n, 1, 0.5)
form <- paste0("y ~ ", paste0("x.", 1:p, collapse = "+"))
model <- brm(
formula = form
,data = df_sim
,family = bernoulli()
)
Error: protect(): protection stack overflow
5: terms.formula(formula, …)
4: stats::terms(formula, …)
3: terms(all_vars_formula)
2: validate_data(data, bterms = bterms, data2 = data2, knots = knots,
drop_unused_levels = drop_unused_levels, data_name = substitute_name(data))
1: brm(formula = form, data = df_sim, family = bernoulli())
# For reference, this runs OK
model <- rstanarm::stan_glm(
formula = form
,data = df_sim
,family = binomial(link = "logit")
)
- Operating System: Ubuntu 24.04.2 LTS (running in WSL2)
- brms Version: 2.22.0
Thanks in advance!