I should just caveat - I haven’t profiled that many models and they can have very different bottlenecks. But yeah. I tried to use a custom allocator (I could try to find the branch) for std::vector
and didn’t see much of a speedup, but didn’t investigate too hard. I don’t think I tried anything similar for Eigen types - it’s much less clear the optimal way to switch the allocator for those (is there anything documented?). I think further experimentation here could be very profitable, especially if we assume it’ll be a while until the math library is refactored to be generically templated over Eigen types (see this thread or this one for some commentary on that; not sure if there’s a definitive thread).
We do the same thing for Stan’s AD primitive, the var
. So, if I recall correctly, we’re doing something much worse (from a SIMD perspective) than having a matrix full of structs as you were thinking; we actually have a matrix full of pointers to structs :P (each var
is a pointer allocated on our custom AD stack memory pool to point to a vari
struct containing a value and an adjoint). Those Eigen matrices are allocated on the heap, though the two doubles for each cell is allocated (individually) in our custom memory pool.