Whn you use the intel compiler then make sure to switch on the fp-precise options as it is described in the cmdstan manual. Otherwise you won’t get any divergences ever reported as the floating point optimizations will have optimized these away for you.
Other than that I compared intel stuff against gcc 4.4… and I remember seeing 10% speedupds in favor for intel.
Maybe newer gcc compiler have gotten better? It would be interesting to see results where you use the gcc compiler, but link against the intel mkl.
This stuff will highly depend on your specific problem. Also, maybe things have changed as the cholseky decomposition code has been rewritten to use less of Eigens code as I recall. Since Eigen forwarded those operations to the MKL, this is not anymore happening.
Yeah, the Intel compiler (at least the one we have which is 20150407) does swallow divergent transitions if you just go with the defaults which is really nasty.
Yeah, mostly doing a lot of vector-vector operations. MKL promotes that they have huge speedup over the OSS BLAS package for this, but I guess that’s overhyped, outdated, or eigen has really good libraries as is.
Thus I put everything into the Makevars file and even the -I${MKLROOT}/include and -DEIGEN_USE_MKL_ALL
and also coded the file references to their direct location, and added the std-11 flag, but I got stuck in
compiling the header files of Eigen with more than 1000 lines of errors.