Brms_multiple always compiles but takes hours to days to do it

mtenan · October 12, 2023, 2:12pm

I’m just wondering if others have had issues where a relatively straightforward brms_multiple call takes hours or days to compile. I’ve got 100 imputed datasets to run and it is taking 4-24+ hours to compile the C++ code. It always eventually compiles and the models run fine but the compilation process takes forever. I’m running this on Windows and used both rstan and cmdStanR as a backend.

The models themselves take less than 5 minutes to run after compilation.

brms isn’t compiling 100 separate models is it? I don’t know what else could be causing this kind of runtime. Thoughts?

stevebronder · October 12, 2023, 2:49pm

Can you post a minimal reproducible example?

mtenan · October 12, 2023, 4:29pm

library(mitml)
library(brms)


data(studentratings)


fml <- ReadAchiev + ReadDis + SchClimate ~ 1 + (1|ID)
imp <- panImpute(studentratings, formula = fml, n.burn = 5000, n.iter = 100, m = 100)
implist <- mitmlComplete(imp)

brm_m1 <- brm_multiple(bf(ReadAchiev ~ 1 + ReadDis + (1|ID)), data = implist, family = gaussian(), 
                       chains=4, iter= 20000, warmup=10000, control= list(adapt_delta=0.99))

Stephen_Wild · October 13, 2023, 7:37pm

This likely because the priors are changing. The brms priors can be data dependent. If that is the case, it will recompile the model for each imputed data set. Try set some manual priors that work for every dataframe in the list.

Note that I am away from my computer, so I can’t run or test your code. I may be wrong

mtenan · October 13, 2023, 7:44pm

I can say that if I set the priors with my real data, it does not appear to affect the runtime. Even then, compiling 100 individual models shouldn’t take 24 hours… should it?

Stephen_Wild · October 13, 2023, 7:58pm

Compiling or running? If the model is recompiling each time (that is, Stan needs to recompile the C++ code), it can add a minute or two each time depending on your computer. At 100 datasets, that’s a couple hours.

Even if it isn’t, it could be because you are running 100 separate models. To reduce the runtime, you could try reducing the number of imputed data sets. Often 15-20 are sufficient (it depends on your particular case). You could also try reducing adapt_delta (.99, for instance, can add a lot to the runtime when compared to .8). You could try threading if you have enough cores. And finally you could reduce the number of iterations, as 20000 is a lot. Usually a few thousand is sufficient (depending on your goals).

When I get a chance later I’ll try give your reproducible example a run and see how much of a difference some changes make)

mtenan · October 13, 2023, 8:08pm

Thanks, I appreciate it. Once the models actually run, they go quite quick. It will run 100 models in less than 10 minutes after the compilation occurs. I figured the adapt_delta wasn’t affecting the compilation time (right?) and since the actual sampling and runtime was so quick, I didn’t concern myself with the number of iterations being so high or the adapt_delta. Those are all set so high because in my original dataset I often have 2-10% of models that appear to have convergence issues.

Let me know if you can reproduce the compilation issue on Windows. It’s occurring on both my Uni and personal computer… so it’s somewhat reproducible.

Stephen_Wild · October 14, 2023, 6:27pm

So I did have long compilation times. I solved it by installing the most recent version of rstan. Give it a try and if that doesn’t work, I am out of ideas.

mtenan · October 16, 2023, 3:49pm

@Stephen_Wild Any idea what your new runtime was for the given number of datasets? I’m just trying to make sure I’ve got reasonable expectations.

I updated rstan, also R, rtools and RStudio while I was at it. It’s dramatically faster. It does compile and run 5 imputed datasets in 8.5 minutes with a quick extrapolation of that to a 100 datasets to 2-3 hours.

Is that something close to what you’ve been getting?

mtenan · October 18, 2023, 1:30pm

Actually, the problem does not seem to be solved. It is somewhat peculiar. I’m actually running 10 different analyses back-to-back and it seems as though the first analysis may compile faster (resulting in a shorter net runtime) but all resulting analyses still seem to still take 6-24 hours to compile.

@paul.buerkner, is there something about how brm_multiple() transmits things to the compiler that could get ‘logjammed’? I’m at a bit of a loss. My Uni is sending me a Mac to see if it has the same issue as my two Windows computers.

paul.buerkner · October 18, 2023, 1:39pm

I don’t know to me honest. I am not aware of it. One apsect could be the use of the “future” package, but this only applies to sampling not to compiling.

Stephen_Wild · October 20, 2023, 11:01pm

I have some issues when I try it with backend = "cmdstan". I am not sure why, but I will try do some digging.

It is very odd.

mtenan · October 24, 2023, 1:08pm

Yeah, I thought the updates were having an effect on overall runtime, but it appears I was incorrect.

Also, I’ve now run that sample code on a M1 Mac and it has the same issue taking a very long time to compile with rStan (haven’t tried cmdstan there, yet), so at least it does not appear to be a “Windows problem”. Also, the Mac is newly imaged so the install for R/RStudio/Stan should have been the most current production versions.

It’s kind of unfortunate because, given the data in which I work, multiple imputation and then a Bayesian model is probably a common occurrence. That becomes hard when it take days to run a single model each time.

mtenan · November 3, 2023, 7:05pm

This isn’t a ‘solution’ per se but in re-looking at how things are working when multiple cores are used, it looks like brm_multiple() is compiling each chain separately and then sampling them? If that’s correct, then it’s just a math game. If you have 50 imputed datasets and are running 2 chains per dataset, you’re compiling and sampling 100x. Then if you’re combined compilation+sampling time is each 10-20 minutes, that pretty quickly adds up to days of computation time.

kauttoj · April 28, 2024, 12:09pm

Is there any progress or additional solutions for this issue? I’m stuck with the same problem.
A simple multivariate Gaussian, constant-only model that compiles in ~30s using normal “brm” call gets stuck in “Compiling Stan program…” when using “brm_multiple”. Its not even scaling linearly so that for 10 datasets it would take ~300s, but much worse (I stopped compiling after ~3h with no end on sight). In my case the model is exactly same for all (non-mice) datasets in a list so in principle only one complication is needed. I’m running win11 machine.

mtenan · April 29, 2024, 1:23pm

I’m not aware of any updates or fixes. My solution was to have both a Windows rig (an ecosystem I’m used to) for model development and then a Mac for the model running, and then also become more judicious with my number of imputation datasets required.

I feel like there has to be some sort of solution, but it’s occurring far below the R/brms layer that I’ve had time to explore.

aeiche01 · January 23, 2025, 9:19pm

I think I came up with a solution that works based on the fact that the “update” function was altered to allow it to work on brm_multiple (Add brm_multiple support for update · Issue #615 · paul-buerkner/brms · GitHub). If I fit a brm_multiple model to only 2 iterations of my data, it creates and compiles the model for me to refer back to. Then if I use update() and refer to the compiled model with all of my imputed data, it skips the compiling and just runs the sampling. I was able to run 100 imputed datasets in less than a minute or so whereas before I was waiting for a long time to compile it

Topic		Replies	Views
Facing error while compiling brms code Developers brms	22	1290	February 11, 2021
Brms::brm_multiple() gives compilation error brms brms	1	78	October 23, 2024
Issues with c++ and brms_multiple brms	31	1584	March 9, 2021
Error using brm_multiple() function brms	3	932	April 5, 2020
Is it impossible to compute multiple models in parallel using brms with `backend = "cmdstanr"`? brms cmdstanr , paralellization , brms	7	1640	July 25, 2022

Brms_multiple always compiles but takes hours to days to do it

Related topics