Compiling Stan against Intel MKL

aaronjg · July 3, 2017, 5:27am

I was trying to speed up some models and tried to compile everything with the intel compiler against the intel MKL, setting the appropriate flags according to:
https://eigen.tuxfamily.org/dox/TopicUsingIntelMKL.html

I saw that this was discussed here a while ago:
https://groups.google.com/forum/#!topic/stan-dev/F9x6oTikOLg
https://groups.google.com/forum/#!msg/stan-users/GXsWmQd-4jc/iQ09VCPrGS0J

After all this, I didn’t seem to get any speedup. In fact, the intel compiler seems to be a bit slower.

Original test case of logistic model
g++ 5.4.0 : 11.7 seconds
icpc 20170213 : 12.2 seconds

Same trend with my more complex model. Plus the intel takes nearly twice as long to build the stan program (96 seconds vs 51 seconds).

wds15 · July 3, 2017, 6:51am

Whn you use the intel compiler then make sure to switch on the fp-precise options as it is described in the cmdstan manual. Otherwise you won’t get any divergences ever reported as the floating point optimizations will have optimized these away for you.

Other than that I compared intel stuff against gcc 4.4… and I remember seeing 10% speedupds in favor for intel.

Maybe newer gcc compiler have gotten better? It would be interesting to see results where you use the gcc compiler, but link against the intel mkl.

Sebastian

aaronjg · July 3, 2017, 7:39am

Thanks. I didn’t realize that there were instructions for doing that in the CmdStan manual, since I was building this all for Rstan.

The previous discussion on the stan users list suggested that the big wins were switching from g++ to the icc, and little gain from using the mkl.

wds15 · July 3, 2017, 11:14am

This stuff will highly depend on your specific problem. Also, maybe things have changed as the cholseky decomposition code has been rewritten to use less of Eigens code as I recall. Since Eigen forwarded those operations to the MKL, this is not anymore happening.

Yeah, the Intel compiler (at least the one we have which is 20150407) does swallow divergent transitions if you just go with the defaults which is really nasty.

aaronjg · July 3, 2017, 11:34am

Yeah, mostly doing a lot of vector-vector operations. MKL promotes that they have huge speedup over the OSS BLAS package for this, but I guess that’s overhyped, outdated, or eigen has really good libraries as is.

Bob_Carpenter · July 4, 2017, 1:20am

I think you also need to turn on blas support in Eigen somehow.

I think we’re still using Eigen cholesky, just not on var types—just on double types, then we’re putting together custom derivatives.

aaronjg · July 4, 2017, 2:32am

I added -DEIGEN_USE_MKL_ALL in the CFLAGS in my R Makevars file, anywhere else it should be set?

Andre_Pfeuffer · July 20, 2017, 9:14am

I got MKL to work with gcc, but not Stan with:

g++ -m64 tri.c -o tri -Wl,--start-group ../intel/mkl/lib/intel64/libmkl_intel_ilp64.a ../intel/mkl/lib/intel64/libmkl_sequential.a ../intel/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -Wl,--no-as-needed -lpthread -lm -ldl

Thus I put everything into the Makevars file and even the
-I${MKLROOT}/include and -DEIGEN_USE_MKL_ALL

and also coded the file references to their direct location, and added the std-11 flag, but I got stuck in
compiling the header files of Eigen with more than 1000 lines of errors.

aaronjg · July 20, 2017, 4:27pm

I’ve only used the intel compiler with mkl, not g++. Are the errors coming up during compiling or linking?

You could also try adding -DMKL_LP64 to your cflags and changing libmkl_intel_ilp64.a to libmkl_intel_lp64.a

from the eigen guide

“on a 64bits system, you must use the LP64 interface (not the ILP64 one)”

Let me know if it speeds things up!

Andre_Pfeuffer · October 16, 2017, 6:31am

##
# Library locations
##
MKLROOT = /home/andre/intel/mkl
STAN ?= stan/
MATH ?= $(STAN)lib/stan_math/
-include $(MATH)make/libraries

##
# Set default compiler options.
## 
CFLAGS = -I src -I $(STAN)src -isystem $(MATH) -isystem $(EIGEN) -isystem $(BOOST) -isystem $(CVODES)/include -Wall -DEIGEN_NO_DEBUG  -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DFUSION_MAX_VECTOR_SIZE=12 -DNO_FPRINTF_OUTPUT -pipe 
CFLAGS_GTEST = -DGTEST_USE_OWN_TR1_TUPLE
CFLAGS += -I $(MKLROOT)/include -DEIGEN_USE_MKL_ALL

LDLIBS = 
LDLIBS_STANC = -Lbin -lstanc
LDLIBS += -L$(MKLROOT)/lib/intel64 -lmkl_intel_lp64
LDLIBS += -lmkl_core -lmkl_sequential -lpthread -lm -ldl -Wl,--no-as-needed

gcc with MKL in local installation (for free on intel web page) in cmdstan makefile on ubuntu 16.04 LTS.
change MKLROOT to your location.

Topic		Replies	Views
Improving Stan sampling speed Modeling fitting-issues	2	403	March 7, 2023
Why is it so slow for stan to compile model? Modeling	7	5030	January 3, 2020
Stan 2.17 running slower on a model than Stan 2.15 General	53	3833	November 7, 2017
Stanc optimization flags CmdStan	3	349	August 4, 2023
CmdStan: CPU faster than GPU? General cmdstan	6	1925	February 18, 2021

Compiling Stan against Intel MKL

Related Topics