Please provide this additional information in addition to your question:
- Operating System: Ubuntu 18.04.2 LTS
- CmdStan Version: 2.14
- Compiler/Toolkit: gcc
I have a linux box with 32 cores that I use for running models (and has only 16GB of ram). I often try to run multiple models in parallel using cmdstan (usually up to 7 models with 4 chain each). It used to work fine until I updated Ubuntu and I guess it updated gcc with it. Now gcc is using huge amounts of ram during compilation time, overflowing my ram and leading to use SWAP, which is grinding everything to a halt.
Make version: GNU Make 4.1
GCC version: 7.4.0
I read that I could force gcc to use less ram by using the following flags:
CFLAGS="$CFLAGS --param ggc-min-expand=0 --param ggc-min-heapsize=524288"
such that it uses no more than 500MB per compile for each model (I don’t really mind if it slows down compilation, as long as it doesn’t overflow the ram. I’m still getting enough speedup by running multiple models in parallel vs. sequentially).
However I’m not sure as to where I should put these arguments ? Or if they would indeed help at all.
Would these flags need to be in the makefile present in the cmdstan directory?
I hope this is the right place to ask for this kind of help,
Thank you so much in advance,
Make the file “make/local” in your cmdstan folder and add whatever you want in there.
For instance I have a make/local in a custom cmdstan where I’m using with a bunch of stuff in it like:
CXXFLAGS=-fopenmp -I src/modal_cpp/spectra/include/Spectra
Thanks so much @bbbales2 for your help with this.
Unfortunately I wasn’t able to get it working.
I created local in the make folder of cmdstan, and tried:
CFLAGS= --param ggc-min-expand=0 --param ggc-min-heapsize=524288
but wouldn’t compile. However, the following command allowed compilation to go through:
CXXFLAGS= --param ggc-min-expand=0 --param ggc-min-heapsize=524288
Unfortunately no matter what arguments I passed to ggc-min-heapsize, the compilation still used huge amounts of ram. So it seems like setting these flags didn’t seem to help.
As it currently stands compiling a single model takes approx 1400M of Virt. Mem and 1300M Res. Mem, when I run ‘htop’. I’m not sure whether this is supposed to be ‘a lot’ for a rather basic logistic model?
I think I’ll run the compilation sequentially for all models, and then run the sampling of all models in parallel. Given that sampling is the process that takes a lot more time, I’ll still be able to see substantial speedup.
It’s not that surprising, unfortunately. Stan uses lots of templated C++ and file that eventually gets compiled is massive.
You can swap out your compiler by specifying
CXX=whatever I think, so if you can figure out your old gcc you can try installing a version and going back to it.
Thanks for your help @bbbales2, I’ll try to re-install an older version of gcc and see if this helps (If I can’t get it to work that way I’ll just pre-compile serially and run models in parallel after compilation).
Did you figure it out how to limit memory usage during compile time?
I’m deploying Stan models to AWS EC2 instances. As the compile seem to require 2-3 GB of ram, the minimum size of the AWS instance has to be set accordingly (i.e. 4 GB). If the compile could be limited to 1 GB of ram, then the cost would be one-fourth of the current cost (for models that doesn’t require more than 1 GB during runtime)
I didn’t, no (I couldn’t find a way to limit the ram usage using various flags, and rolling back to an earlier version of GCC sounded like more trouble than it was worth).
What I did instead is compile all models sequentially, prior to running the models. This way I can just call the models and since they’re already compiled I can run multiple models at the same time until the ram is near-full.
Hope this helps,
ps: if you do find a way to limit GCC memory usage, please post back your solution here (I’m really interested)
If you’re just interested in limiting the cost of your AWS EC2 instances, I can imagine that you should be able to run the compilation on a 1GB machine. It might swap more and take longer to compile, but it’ll compile nonetheless. I think GCC just uses as much ram as necessary if it is available (if you have loads of free ram). But if you have little ram, I imagine it’ll just use more read/write to disk instead and take a little longer. People have been able to run complex compiles using as little as a few hundred MB of ram.
got it working by using “C14FLAGS” instead of “CXXFLAGS”. Also an Ubuntu 18.04.2 LTS OS, so maybe works for you as well.
Don’t you mean “CXX14FLAGS”, not “C14FLAGS”?
Yes, I meant “CXX14FLAGS”.
I was a bit fast there claiming that it works. It works in the sense that the memory use gets constrained. But, the compile time get so long so that is not really a viable option.