Hello,
I am trying to run brms model on a 15M data points in train while another 6M in test data. When I run the model it throughs following error:
“Error in collapse_object(objnames, tmp, indent) : R character strings are limited to 2^31-1 bytes”.
I have also tried run the brms model on 9M datapoint in train data and that works fine.
Some stats regarding the model:
- Features: 26. Priors for 17 features are from beta distribution and rest are from normal distribution.
- total data: 21 M
Code Snippet:
library(rstan)
library(brms)
data <- read_parquet('/path/to/file')
# some transformations
train_size <- floor(0.7* 21*10^6)
train <- data[1:train_size, ]
test <- data[train_size:dim(data)[1], ]
my_prior <- c(
prior(normal(0,1), class = 'b', nlpar='intercept') +
prior(beta(16.9, 152.21, class = 'b', nlpar='x1', lb=0, ub=0) +
prior(beta(16.9, 152.21, class = 'b', nlpar='x2', lb=0, ub=0) +
prior(beta(16.9, 152.21, class = 'b', nlpar='x3', lb=0, ub=0) +
.......... similarly for 23 more features
)
model = brm_multiple(
bf(y ~ Intercept + x1 + x2 + x3 + ...., nl=True) + lf(intercept ~ 1) +
lf(x1 ~ 0 + x_1) + lf(x2 ~ 0 + x_2) + lf(x3 ~ 0 +x_3) + ...... for 23 more features,
data = df_split, family = bernoulli("logit"), backend = "cmdstanr",
threads = threading(15, grainsize = 625), prior = my_prior,
warmup = 1000, chains = 4, cores = 12, seed = 12345,
iter = 2000, silent = FALSE, thin = 1)
plot(model)
Can someone please help me out regarding this issue.
Kind Regards,