Stan SIMD & Performance

seantalts · December 23, 2019, 4:04am

I should just caveat - I haven’t profiled that many models and they can have very different bottlenecks. But yeah. I tried to use a custom allocator (I could try to find the branch) for std::vector and didn’t see much of a speedup, but didn’t investigate too hard. I don’t think I tried anything similar for Eigen types - it’s much less clear the optimal way to switch the allocator for those (is there anything documented?). I think further experimentation here could be very profitable, especially if we assume it’ll be a while until the math library is refactored to be generically templated over Eigen types (see this thread or this one for some commentary on that; not sure if there’s a definitive thread).

We do the same thing for Stan’s AD primitive, the var. So, if I recall correctly, we’re doing something much worse (from a SIMD perspective) than having a matrix full of structs as you were thinking; we actually have a matrix full of pointers to structs :P (each var is a pointer allocated on our custom AD stack memory pool to point to a vari struct containing a value and an adjoint). Those Eigen matrices are allocated on the heap, though the two doubles for each cell is allocated (individually) in our custom memory pool.

Topic		Replies	Views
Stanc3 optimization and analyses walkthrough during StanCon Meetings	6	1073	August 22, 2019
Compiling Stan against Intel MKL General	9	2047	October 16, 2017
Stanc3 Math lib opencl integration Developers	29	1240	September 23, 2019
Dot_product vs vectorization Developers	6	2190	April 17, 2018
Are matrix operations parallelized? Modeling	3	799	February 27, 2020

Stan SIMD & Performance

Related topics