I have a computation that I have been struggling to find a way to vectorise (and that I suspect cannot be vectorised) and therefore wanted to hear if anyone has clever ideas about how to do this as efficiently as possible.

Briefly, I have an array of k reals, and another array of k [n,m] matrices. I need to multiply each matrix by the corresponding real, and then compute the sum of these k, scaled matrices. This can be done by looping like this:

data {
int n; int m; int k;
matrix[n, m] M[k];
real theta[k];
}
transformed data {
matrix[n,m] Msum;
Msum = rep_matrix(0,n,m);
for (i in 1:k) {
Msum = Msum + theta[i] * M[i];
}
}

Is there a better (possibly vectorised) way of doing this? Any pointers or ideas hugely appreciated!

Youâ€™re already multiplying a matrix by a scalar and adding it to a matrix so youâ€™re working with relatively high-level operations. In Stan loops are already fast so I wouldnâ€™t worry about optimizing this (unless youâ€™re just doing it for fun, then have at). Any speed-up here will be tiny compared to getting posterior geometry right anyway.

number of autodiff variables (approximated by the number of parameters and their descendants in the calculations

blank / neither candidate

vectorizing, maybe

What vectorizing lets you do is deal with #2, sometimes, when the vectorized calculation can be more thrifty with the autodiff variables it allocates. Donâ€™t do it for its own sake.

Also, I assume you really are dealing with parameters not data in your real code. If itâ€™s data/transformed data donâ€™t optimize, thereâ€™s no point to it, it only runs once.

Thanks - good point. So I probably should not worry bout optimising this particular aspect then. (And yes: in my real model these are parameters - the above was just from my test code where I dont do any sampling, but just tried to figure out how things worked).

Iâ€™ll temper my recommendation by saying that @Bob_Carpenter and others often come up with recommendations for how to vectorize in order to reduce the number of autodiff variables and for some models it really will matter (like when you have a large symmetric matrix and thereâ€™s a way to internally make it triangular that can save you a lot of run time)â€¦ so maybe:

Thanks. I should probably have been clearer in my original description but these are in fact parameters and in my real model this would take place in the model block. (The above is from my test code - I like to make sure that I understand the syntax and that I get the math right by first writing small test programs where I provide Stan with examples that I know the answers to, using the â€śdataâ€ť and â€śtransformed dataâ€ť blocks for getting data in and executing statements, and using lots of print statements along the way).

In my real model the array contains about 800 matrices (k=800), each of size 300 x 4 (n=300, m=4), and the resulting sum-matrix contains parameters that are in subsequent sampling statements, so I figured it was worth optimising a bit.

You can see our problems in trying to diagnose fragments of models out of context!

Other than writing a custom function in C++, I donâ€™t know how to easily make that faster.

You could start the sum at theta[1] * M[1] and then loop over 2:k, but thatâ€™s in the noise if k = 800.

Rather than doing matrix sums, itâ€™d be better to keep an array of each dimension (n by m) and then use sum() at the endâ€”itâ€™s a smaller expression graph. But compared to everything else going on in the multiply, itâ€™s probably not worth the time or space savings.

We donâ€™t have matrix sum(matrix[]) for a sequence of matrices.