Compiling Stan against Intel MKL


#1

I was trying to speed up some models and tried to compile everything with the intel compiler against the intel MKL, setting the appropriate flags according to:
https://eigen.tuxfamily.org/dox/TopicUsingIntelMKL.html

I saw that this was discussed here a while ago:
https://groups.google.com/forum/#!topic/stan-dev/F9x6oTikOLg
https://groups.google.com/forum/#!msg/stan-users/GXsWmQd-4jc/iQ09VCPrGS0J

After all this, I didn’t seem to get any speedup. In fact, the intel compiler seems to be a bit slower.

Original test case of logistic model
g++ 5.4.0 : 11.7 seconds
icpc 20170213 : 12.2 seconds

Same trend with my more complex model. Plus the intel takes nearly twice as long to build the stan program (96 seconds vs 51 seconds).


Intel compilers
#2

Whn you use the intel compiler then make sure to switch on the fp-precise options as it is described in the cmdstan manual. Otherwise you won’t get any divergences ever reported as the floating point optimizations will have optimized these away for you.

Other than that I compared intel stuff against gcc 4.4… and I remember seeing 10% speedupds in favor for intel.

Maybe newer gcc compiler have gotten better? It would be interesting to see results where you use the gcc compiler, but link against the intel mkl.

Sebastian


#3

Thanks. I didn’t realize that there were instructions for doing that in the CmdStan manual, since I was building this all for Rstan.

The previous discussion on the stan users list suggested that the big wins were switching from g++ to the icc, and little gain from using the mkl.


#4

This stuff will highly depend on your specific problem. Also, maybe things have changed as the cholseky decomposition code has been rewritten to use less of Eigens code as I recall. Since Eigen forwarded those operations to the MKL, this is not anymore happening.

Yeah, the Intel compiler (at least the one we have which is 20150407) does swallow divergent transitions if you just go with the defaults which is really nasty.


#5

Yeah, mostly doing a lot of vector-vector operations. MKL promotes that they have huge speedup over the OSS BLAS package for this, but I guess that’s overhyped, outdated, or eigen has really good libraries as is.


#6

I think you also need to turn on blas support in Eigen somehow.

I think we’re still using Eigen cholesky, just not on var types—just on double types, then we’re putting together custom derivatives.


#7

I added -DEIGEN_USE_MKL_ALL in the CFLAGS in my R Makevars file, anywhere else it should be set?


#8

I got MKL to work with gcc, but not Stan with:

g++ -m64 tri.c -o tri -Wl,--start-group ../intel/mkl/lib/intel64/libmkl_intel_ilp64.a ../intel/mkl/lib/intel64/libmkl_sequential.a ../intel/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -Wl,--no-as-needed -lpthread -lm -ldl

Thus I put everything into the Makevars file and even the
-I${MKLROOT}/include and -DEIGEN_USE_MKL_ALL

and also coded the file references to their direct location, and added the std-11 flag, but I got stuck in
compiling the header files of Eigen with more than 1000 lines of errors.


#9

I’ve only used the intel compiler with mkl, not g++. Are the errors coming up during compiling or linking?

You could also try adding -DMKL_LP64 to your cflags and changing libmkl_intel_ilp64.a to libmkl_intel_lp64.a

from the eigen guide

“on a 64bits system, you must use the LP64 interface (not the ILP64 one)”

Let me know if it speeds things up!


#10
##
# Library locations
##
MKLROOT = /home/andre/intel/mkl
STAN ?= stan/
MATH ?= $(STAN)lib/stan_math/
-include $(MATH)make/libraries

##
# Set default compiler options.
## 
CFLAGS = -I src -I $(STAN)src -isystem $(MATH) -isystem $(EIGEN) -isystem $(BOOST) -isystem $(CVODES)/include -Wall -DEIGEN_NO_DEBUG  -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DFUSION_MAX_VECTOR_SIZE=12 -DNO_FPRINTF_OUTPUT -pipe 
CFLAGS_GTEST = -DGTEST_USE_OWN_TR1_TUPLE
CFLAGS += -I $(MKLROOT)/include -DEIGEN_USE_MKL_ALL

LDLIBS = 
LDLIBS_STANC = -Lbin -lstanc
LDLIBS += -L$(MKLROOT)/lib/intel64 -lmkl_intel_lp64
LDLIBS += -lmkl_core -lmkl_sequential -lpthread -lm -ldl -Wl,--no-as-needed

gcc with MKL in local installation (for free on intel web page) in cmdstan makefile on ubuntu 16.04 LTS.
change MKLROOT to your location.