Brms sampling/speed errors in multi-level model

elliottmorris · June 5, 2019, 2:23pm

Hi all,

I’m having some trouble getting the Mr part of a MrP model to fit efficiently with brms. I’m hoping someone may be able to help guide me toward a better fit.

Here is the R code for the model:

vote_model <- brm(mvbind(vote_trump,vote_clinton) ~ 1 +
                    # pop level
                    state_clinton + state_black_pct + state_hispanic_pct + state_white_protestant + 
                    # group effects
                    (1|sex) + (1|race) + (1|age) + (1|edu) + (1|state_name) + (1|region) +
                    # interactions b/t group effects
                    (1|race:edu) + (1|age:edu) + (1|race:region),
                  data = cces,
                  family = bernoulli(link = "logit"),
                  # stan stuff
                  iter=1000, chains=1, cores=1, warmup=500,
                  algorithm='sampling',
                  save_model = 'output/stan_model_code.stan', 
                  # priors (https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations)
                  prior = c(set_prior("normal(0, 1)", class = "Intercept"),
                            set_prior("student_t(3, 0, 4)", class = "sd",resp=c('votetrump','voteclinton'))),
                  # help sampling algorithm converge (http://mc-stan.org/misc/warnings.html)
                  control = list(adapt_delta = 0.95,
                                 max_treedepth = 10),
                  seed = 20190605,
                  refresh = 1)

I let it run for about 12 hours and then gave up, after which it had only completed 200 iterations. The dataset is just 40k survey respondents, so that doesn’t seem right.

Help?

Elliott

Session info:

R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin15.6.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

paul.buerkner · June 5, 2019, 7:06pm

Welcome to Stan discourse! When asking brms related questions, please use the Interfaces - brms tag. I have changed it for you this time.

From quickly looking at your model, I have a few observations:

(1) Is the model really intended to be multivariate as implied by mvbind(…). You may want to start with one single outcome variable to get a feeling of how well it works.

(2) You included so many group-level terms that you are probably overfitting the data. For instance, is age really a grouping variable in your data or rather a continuous variable?

(3) mulitlevel bernoulli models often require a very long time at the start before they continue to sample faster. Having to wait 12 hours for 200 iterations may not imply that you need 5 times the time for 1000 iterations but perhaps only twice as much.

elliottmorris · June 6, 2019, 8:10am

Hey Paul,

Thanks for adding the proper tag. Will remember.

For the sake of trying to get this to work, I’ve changed to a single outcome model and removed the (1|age) and (1|age:edu) group terms, though to be sure, both will be necessary for the final approach.

I’m still not too pleased. The gradient evaluation stipulated 343.33 seconds for 1k transitions, and after waiting all night it had just barely completed 400. Surely that’s too long? Should I be looking around under the hood?

paul.buerkner · June 6, 2019, 8:13am

Honestly, I am not surprised it takes that long. You may need to be more patient, or, if you want to get a quicker look at how well the model works, use subsampling and then only run analysis on the full sample once you are happy with the model.

elliottmorris · June 6, 2019, 8:14am

Well, if you’re not surprised, I guess I shouldn’t be either :) Thanks.

kuriwaki · February 16, 2020, 4:59pm

Collapsing the data into counts and modeling it as a binomial (instead of bernoulli at the individual level) could be faster? I’m also running MRP with a full year of CCES and it only takes ~2 hours, although I don’t have as nearly as many covariates as you do.

By counts, something like

cces_counts <- cces %>% 
    group_by(state_name, state_black_pct, state_hispanic_pct, state_white_protestant, sex, race, age, edu) %>% 
    summarize(trials = n(),
              clinton = sum(vote_clinton))

elliottmorris · February 16, 2020, 7:02pm

Good suggestion. I’m running this via the ‘categorical’ family and with even more covariates at just under 12 hours now. Turns out the biggest issue earlier was a data error that messed up a factor variable and prevented efficient convergence.

Topic		Replies	Views
Brms sampling in multi-level multinomial models Modeling specification , brms , meta-analysis	8	2097	March 16, 2021
Help building a faster model brms	7	1228	October 8, 2018
Rewrite brms stan code to speed up model with multiple measurement error predictors brms specification , cmdstanr , horseshoe-prior	0	394	May 24, 2023
Brms paper published Publicity	2	1249	September 23, 2017
Memory issues using stan file created by brms brms	15	1050	November 9, 2018

Brms sampling/speed errors in multi-level model

Related topics