I am trying to use brms package to fit a cross-classified multilevel model with income as the dependent variable, industry and occupation as the level-2 identifiers, 3 million observations, 60 industries, and 300 occupations.
Gradient evaluation took 7.855 seconds
1000 transitions using 10 leapfrog steps per transition would take 78550 seconds.
Adjust your expectations accordingly!
I suppose that you have tried to fit a (random) subset of your data in order to debug your model. Did you have this problem ?
As a first step, you may try to specify that your effects are uncorrelated : from the “Overview” vignette :
By default, group-level coefficients within a grouping factor are assumed to be correlated. Correlations can be set to zero by using the (coefs || group) syntax.
With 60 and 300 levels respectively, your correlation matrix is likely to be huge… Model checking may indicate a lack of fit indicating a need for better correlation modeling, but this first “rough” model may point to problems with your data, that should be solved first.
You may also try to compare with glm or bugs modeling (possibly on a random sample of your data), explicitly not modeling correlations.
With 3.5 million observations, I am not overly surprised of the speed you are seeing. Since your model structure is actually quite simple and you seem to have not specfied any specific priors, using brms (or other bayesian MLM implemenetations) for this model may not be worth it(?) But this really depends on what kind of post-processing you want to do with your model.
My thinking is if it takes days to run a simple model like this with unknown possibility to get the result, how is it going to perform with complex models …
Indeed. Fully Bayesian estimation is not necessarily feasible for very large data sets. In your case, unless you want to do something specifically bayesian with the model results, you can savely use lme4 and expect brms to return very similar results for this simple linear multilevel model.