My computer has 40Gb of RAM memory and 10 CPU cores. Previously, I ran the mode with flat default priors and iter=10000 but after 18 hours the model was still running. Any help to speed out this would be really welcome!!
In my experience, setting better priors often results in much greater sampling efficiency and less time to sample.
After the priors, you could consider specifying backend = 'cmdstanr' (and make sure you have cmdstanr installed) – that alone has helped my models speed up in some cases. Plus, that gives you the option to pursue within-chain parallelisation; you’ve got some extra cores, hopefully enough memory, and the Bernoulli likelihood might be expensive enough to warrant it.
Your cores=4, iter=4000 approach looks much better than setting iter=10000, which is a great start. @zacho made fine points, and I’d like expand a bit on priors. It looks like you’re setting a generic prior on your \beta coefficients, but going with defaults for the other parameters.
Consider theory-based priors on your \beta coefficients.
Think about a better prior for your \sigma parameter (variation in random intercepts). My go-to for a multilevel Bernoulli model is prior(normal(0, 1), class = sd), to which brm() will assign a lower bound of zero by default.
Keep in mind that brm() sets priors for the intercept under the presumption you have mean centered all predictors. If this is not the case, consider either
a. mean centering all your predictors,
b. using the 0 + Intercept syntax (see the brmsformula and set_prior sections in the user guide), or
c. setting center = FALSE in brm() (see the brmsformula section in the user guide).
Also, consider not only mean-centering but actually standardizing any continuous variables (perhaps age and time in your data). This will make it easier to assign better priors to the \beta coefficients. IMO, prior(normal(0, 1), class = b) is a great weakly-regularizing default prior for \beta coefficients on standardized predictors in a Bernoulli model, like yours. It’s a pretty good default for any dummy-coded categorical variables, too (possibly like gender in your data).
Also, since you’re swimming in data, consider first fitting and fully debugging your model with a random subset of say 10% of your cases. The debugging process would not only include making sure all your syntax is correct, but also making sure your priors are working as intended. The 10% subset approach could save you a lot of time in this phase.
Regarding the predictors’ standardization, I am wondering if you would suggest any particular approach as my dataset contains longitudinal / repeated measures data (it’s in long format originally). Would you standardize in wide format (by time point)?