Compiling Stan against Intel MKL

aaronjg · July 3, 2017, 5:27am

I was trying to speed up some models and tried to compile everything with the intel compiler against the intel MKL, setting the appropriate flags according to:
https://eigen.tuxfamily.org/dox/TopicUsingIntelMKL.html

I saw that this was discussed here a while ago:
https://groups.google.com/forum/#!topic/stan-dev/F9x6oTikOLg
https://groups.google.com/forum/#!msg/stan-users/GXsWmQd-4jc/iQ09VCPrGS0J

After all this, I didn’t seem to get any speedup. In fact, the intel compiler seems to be a bit slower.

Original test case of logistic model
g++ 5.4.0 : 11.7 seconds
icpc 20170213 : 12.2 seconds

Same trend with my more complex model. Plus the intel takes nearly twice as long to build the stan program (96 seconds vs 51 seconds).

wds15 · July 3, 2017, 6:51am

Whn you use the intel compiler then make sure to switch on the fp-precise options as it is described in the cmdstan manual. Otherwise you won’t get any divergences ever reported as the floating point optimizations will have optimized these away for you.

Other than that I compared intel stuff against gcc 4.4… and I remember seeing 10% speedupds in favor for intel.

Maybe newer gcc compiler have gotten better? It would be interesting to see results where you use the gcc compiler, but link against the intel mkl.

Sebastian

aaronjg · July 3, 2017, 7:39am

Thanks. I didn’t realize that there were instructions for doing that in the CmdStan manual, since I was building this all for Rstan.

The previous discussion on the stan users list suggested that the big wins were switching from g++ to the icc, and little gain from using the mkl.

wds15 · July 3, 2017, 11:14am

This stuff will highly depend on your specific problem. Also, maybe things have changed as the cholseky decomposition code has been rewritten to use less of Eigens code as I recall. Since Eigen forwarded those operations to the MKL, this is not anymore happening.

Yeah, the Intel compiler (at least the one we have which is 20150407) does swallow divergent transitions if you just go with the defaults which is really nasty.

aaronjg · July 3, 2017, 11:34am

Yeah, mostly doing a lot of vector-vector operations. MKL promotes that they have huge speedup over the OSS BLAS package for this, but I guess that’s overhyped, outdated, or eigen has really good libraries as is.

Bob_Carpenter · July 4, 2017, 1:20am

I think you also need to turn on blas support in Eigen somehow.

I think we’re still using Eigen cholesky, just not on var types—just on double types, then we’re putting together custom derivatives.

aaronjg · July 4, 2017, 2:32am

I added -DEIGEN_USE_MKL_ALL in the CFLAGS in my R Makevars file, anywhere else it should be set?

Andre_Pfeuffer · July 20, 2017, 9:14am

I got MKL to work with gcc, but not Stan with:

g++ -m64 tri.c -o tri -Wl,--start-group ../intel/mkl/lib/intel64/libmkl_intel_ilp64.a ../intel/mkl/lib/intel64/libmkl_sequential.a ../intel/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -Wl,--no-as-needed -lpthread -lm -ldl

Thus I put everything into the Makevars file and even the
-I${MKLROOT}/include and -DEIGEN_USE_MKL_ALL

and also coded the file references to their direct location, and added the std-11 flag, but I got stuck in
compiling the header files of Eigen with more than 1000 lines of errors.

aaronjg · July 20, 2017, 4:27pm

I’ve only used the intel compiler with mkl, not g++. Are the errors coming up during compiling or linking?

You could also try adding -DMKL_LP64 to your cflags and changing libmkl_intel_ilp64.a to libmkl_intel_lp64.a

from the eigen guide

“on a 64bits system, you must use the LP64 interface (not the ILP64 one)”

Let me know if it speeds things up!

Andre_Pfeuffer · October 16, 2017, 6:31am

##
# Library locations
##
MKLROOT = /home/andre/intel/mkl
STAN ?= stan/
MATH ?= $(STAN)lib/stan_math/
-include $(MATH)make/libraries

##
# Set default compiler options.
## 
CFLAGS = -I src -I $(STAN)src -isystem $(MATH) -isystem $(EIGEN) -isystem $(BOOST) -isystem $(CVODES)/include -Wall -DEIGEN_NO_DEBUG  -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DFUSION_MAX_VECTOR_SIZE=12 -DNO_FPRINTF_OUTPUT -pipe 
CFLAGS_GTEST = -DGTEST_USE_OWN_TR1_TUPLE
CFLAGS += -I $(MKLROOT)/include -DEIGEN_USE_MKL_ALL

LDLIBS = 
LDLIBS_STANC = -Lbin -lstanc
LDLIBS += -L$(MKLROOT)/lib/intel64 -lmkl_intel_lp64
LDLIBS += -lmkl_core -lmkl_sequential -lpthread -lm -ldl -Wl,--no-as-needed

gcc with MKL in local installation (for free on intel web page) in cmdstan makefile on ubuntu 16.04 LTS.
change MKLROOT to your location.

Topic		Replies	Views
Intel compilers General	7	1253	October 22, 2017
Stan SIMD & Performance Algorithms	23	4476	January 23, 2020
GNU C++ compiler and Intel MKL (FYI) General performance	4	974	January 22, 2019
Stanc3 optimization and analyses walkthrough during StanCon Meetings	6	1073	August 22, 2019
First stanc3 release candidate! Developers	9	2199	August 19, 2019

Compiling Stan against Intel MKL

Related topics