Vectorization of sum of real*matrix terms - any ideas?


Hugely enjoying working with Stan here!

I have a computation that I have been struggling to find a way to vectorise (and that I suspect cannot be vectorised) and therefore wanted to hear if anyone has clever ideas about how to do this as efficiently as possible.

Briefly, I have an array of k reals, and another array of k [n,m] matrices. I need to multiply each matrix by the corresponding real, and then compute the sum of these k, scaled matrices. This can be done by looping like this:

data {
    int n; int m; int k;
    matrix[n, m] M[k];
    real theta[k];
transformed data {
    matrix[n,m] Msum;
    Msum = rep_matrix(0,n,m);
    for (i in 1:k) {
        Msum = Msum + theta[i] * M[i];

Is there a better (possibly vectorised) way of doing this? Any pointers or ideas hugely appreciated!



You’re already multiplying a matrix by a scalar and adding it to a matrix so you’re working with relatively high-level operations. In Stan loops are already fast so I wouldn’t worry about optimizing this (unless you’re just doing it for fun, then have at). Any speed-up here will be tiny compared to getting posterior geometry right anyway.


That is, when you optimize:

  1. posterior geometry
  2. number of autodiff variables (approximated by the number of parameters and their descendants in the calculations
  3. blank / neither candidate
  4. vectorizing, maybe

What vectorizing lets you do is deal with #2, sometimes, when the vectorized calculation can be more thrifty with the autodiff variables it allocates. Don’t do it for its own sake.

Also, I assume you really are dealing with parameters not data in your real code. If it’s data/transformed data don’t optimize, there’s no point to it, it only runs once.


Thanks - good point. So I probably should not worry bout optimising this particular aspect then. (And yes: in my real model these are parameters - the above was just from my test code where I dont do any sampling, but just tried to figure out how things worked).


Incidentally, what is #3 referring to?


The French presidential election… sorry :) LePen got 3rd place after blank/no choice




I’ll temper my recommendation by saying that @Bob_Carpenter and others often come up with recommendations for how to vectorize in order to reduce the number of autodiff variables and for some models it really will matter (like when you have a large symmetric matrix and there’s a way to internally make it triangular that can save you a lot of run time)… so maybe:

  1. posterior geometry
  2. vectorize *_lpdf, *_lpmf calls
  3. check for specialized matrix functions


This is just transformed data. It’s only evaluated once. So don’t worry about it.

We don’t have tensor arithmetic, so there’s no way to get rid of that loop you have.

You can consolidate the first two lines with compound declare/define to:

matrix[n, m] Msum = rep_matrix(0, n, m);


Thanks. I should probably have been clearer in my original description but these are in fact parameters and in my real model this would take place in the model block. (The above is from my test code - I like to make sure that I understand the syntax and that I get the math right by first writing small test programs where I provide Stan with examples that I know the answers to, using the “data” and “transformed data” blocks for getting data in and executing statements, and using lots of print statements along the way).

In my real model the array contains about 800 matrices (k=800), each of size 300 x 4 (n=300, m=4), and the resulting sum-matrix contains parameters that are in subsequent sampling statements, so I figured it was worth optimising a bit.


You can see our problems in trying to diagnose fragments of models out of context!

Other than writing a custom function in C++, I don’t know how to easily make that faster.

You could start the sum at theta[1] * M[1] and then loop over 2:k, but that’s in the noise if k = 800.

Rather than doing matrix sums, it’d be better to keep an array of each dimension (n by m) and then use sum() at the end—it’s a smaller expression graph. But compared to everything else going on in the multiply, it’s probably not worth the time or space savings.

We don’t have matrix sum(matrix[]) for a sequence of matrices.