Are matrix operations parallelized?

Just a quick general question: are stan’s matrix operations like crossprod, diag_pre_multiply, quad_form,… able to use multiple threads? I assume that you rely on the Eigen library here, and Eigen does support openMP. So I am wondering how stan behaves here.

Or, in other words, does it make sense to break up things like diag_pre_multiply using map_rect … or not?

They do not. Although Eigen can utilize openMP, that is only for double, float, int, and other types that are native to C++. That doesn’t work with Stan where those matrices are usually going to have var in them. The only matrix operation that is parallelizeable currently is the Cholesky decomposition if you enable the GPU stuff and have a matrix that is at least 1200x1200. That works by pulling out the double values, handing off to the GPU, and doing the differentiation analytically. There are some more GPU things in the works along those lines.


Somewhat related question: what happens if only some of the elements of the matrix have a var in them? For example, what if I have a matrix of data and I add to only the diagonal elements parameters then use that resulting sum matrix in my log density? Does it become a matrix of vars and slow down computation?

Was trying to read through the original Stan math paper to figure that out since I think one of my models might benefit from using an analytic gradient.

Yes. In C++, you cannot have a container like multiple types, so everything has to promoted to a common type.