I am trying to run brms model on a 15M data points in train while another 6M in test data. When I run the model it throughs following error:
“Error in collapse_object(objnames, tmp, indent) : R character strings are limited to 2^31-1 bytes”.
I have also tried run the brms model on 9M datapoint in train data and that works fine.
Some stats regarding the model:
- Features: 26. Priors for 17 features are from beta distribution and rest are from normal distribution.
- total data: 21 M
library(rstan) library(brms) data <- read_parquet('/path/to/file') # some transformations train_size <- floor(0.7* 21*10^6) train <- data[1:train_size, ] test <- data[train_size:dim(data), ] my_prior <- c( prior(normal(0,1), class = 'b', nlpar='intercept') + prior(beta(16.9, 152.21, class = 'b', nlpar='x1', lb=0, ub=0) + prior(beta(16.9, 152.21, class = 'b', nlpar='x2', lb=0, ub=0) + prior(beta(16.9, 152.21, class = 'b', nlpar='x3', lb=0, ub=0) + .......... similarly for 23 more features ) model = brm_multiple( bf(y ~ Intercept + x1 + x2 + x3 + ...., nl=True) + lf(intercept ~ 1) + lf(x1 ~ 0 + x_1) + lf(x2 ~ 0 + x_2) + lf(x3 ~ 0 +x_3) + ...... for 23 more features, data = df_split, family = bernoulli("logit"), backend = "cmdstanr", threads = threading(15, grainsize = 625), prior = my_prior, warmup = 1000, chains = 4, cores = 12, seed = 12345, iter = 2000, silent = FALSE, thin = 1) plot(model)
Can someone please help me out regarding this issue.