Vectorised unary functions - Eigen implementations?

For some of the vectorised unary functions in prim/mat (i.e. those using apply_scalar_unary), it looks they could instead be implemented using Eigen’s coefficient-wise tools: https://eigen.tuxfamily.org/dox/group__CoeffwiseMathFunctions.html

This could give a bit of a speed boost since some operations support SIMD instruction sets (parallelising the operation within a single core, kind of).

Should I create an issue and look at implementing these, or is apply_scalar_unary preferred here?

Go ahead!

… but I recommend to benchmark your changes to make sure that the additional complication of the code is worth it as compilers are good at guessing stuff… you will want to get familiar with the performance cmdstan git repo.

That’s a good idea, will do thanks

Really interesting stuff here. I coded up some rough benchmarks comparing the performance of the log and exp functions.

Log showed some minor improvements in speed, but nothing particularly impressive:

log 
1000 randomly generated 10000 x 1 Vectors (milliseconds)
stan: 159
eigen: 148
1000 randomly generated 1000 x 1000 Matrices (milliseconds)
stan: 16508
eigen: 15180

Exp on the other hand showed a dramatic improvement:

exp
1000 randomly generated 10000 x 1 Vectors (milliseconds)
stan: 75
eigen: 22
1000 randomly generated 1000 x 1000 Matrices (milliseconds)
stan: 8644
eigen: 2395

I would guess that the difference in improvement is because Eigen’s exp supports AVX for doubles, whereas log only supports AVX for floats. Either way, interesting to see the performance boosts that are available.

Benchmarking code below, compiled with:

g++ -std=c++1y -march=native  -O3  -I . -I lib/eigen_3.3.3  eigen_log_exp_test.cpp

eigen_log_exp_test.cpp (3.9 KB)

(warning will use up approx 8gb RAM)

2 Likes