Vectorization of sum of real*matrix terms - any ideas?

gorm · May 12, 2017, 12:08pm

Hugely enjoying working with Stan here!

I have a computation that I have been struggling to find a way to vectorise (and that I suspect cannot be vectorised) and therefore wanted to hear if anyone has clever ideas about how to do this as efficiently as possible.

Briefly, I have an array of k reals, and another array of k [n,m] matrices. I need to multiply each matrix by the corresponding real, and then compute the sum of these k, scaled matrices. This can be done by looping like this:

data {
    int n; int m; int k;
    matrix[n, m] M[k];
    real theta[k];
}
transformed data {
    matrix[n,m] Msum;
    Msum = rep_matrix(0,n,m);
    for (i in 1:k) {
        Msum = Msum + theta[i] * M[i];
    }
}

Is there a better (possibly vectorised) way of doing this? Any pointers or ideas hugely appreciated!

Thanks,
Anders

sakrejda · May 12, 2017, 12:45pm

You’re already multiplying a matrix by a scalar and adding it to a matrix so you’re working with relatively high-level operations. In Stan loops are already fast so I wouldn’t worry about optimizing this (unless you’re just doing it for fun, then have at). Any speed-up here will be tiny compared to getting posterior geometry right anyway.

sakrejda · May 12, 2017, 12:49pm

That is, when you optimize:

posterior geometry
number of autodiff variables (approximated by the number of parameters and their descendants in the calculations
blank / neither candidate
vectorizing, maybe

What vectorizing lets you do is deal with #2, sometimes, when the vectorized calculation can be more thrifty with the autodiff variables it allocates. Don’t do it for its own sake.

Also, I assume you really are dealing with parameters not data in your real code. If it’s data/transformed data don’t optimize, there’s no point to it, it only runs once.

gorm · May 12, 2017, 1:23pm

Thanks - good point. So I probably should not worry bout optimising this particular aspect then. (And yes: in my real model these are parameters - the above was just from my test code where I dont do any sampling, but just tried to figure out how things worked).

gorm · May 12, 2017, 1:27pm

Incidentally, what is #3 referring to?

sakrejda · May 12, 2017, 1:28pm

The French presidential election… sorry :) LePen got 3rd place after blank/no choice

gorm · May 12, 2017, 1:30pm

Haha!

sakrejda · May 12, 2017, 1:33pm

I’ll temper my recommendation by saying that @Bob_Carpenter and others often come up with recommendations for how to vectorize in order to reduce the number of autodiff variables and for some models it really will matter (like when you have a large symmetric matrix and there’s a way to internally make it triangular that can save you a lot of run time)… so maybe:

posterior geometry
vectorize *_lpdf, *_lpmf calls
check for specialized matrix functions

Bob_Carpenter · May 15, 2017, 6:59pm

This is just transformed data. It’s only evaluated once. So don’t worry about it.

We don’t have tensor arithmetic, so there’s no way to get rid of that loop you have.

You can consolidate the first two lines with compound declare/define to:

matrix[n, m] Msum = rep_matrix(0, n, m);

gorm · May 16, 2017, 7:14am

Thanks. I should probably have been clearer in my original description but these are in fact parameters and in my real model this would take place in the model block. (The above is from my test code - I like to make sure that I understand the syntax and that I get the math right by first writing small test programs where I provide Stan with examples that I know the answers to, using the “data” and “transformed data” blocks for getting data in and executing statements, and using lots of print statements along the way).

In my real model the array contains about 800 matrices (k=800), each of size 300 x 4 (n=300, m=4), and the resulting sum-matrix contains parameters that are in subsequent sampling statements, so I figured it was worth optimising a bit.

Bob_Carpenter · May 16, 2017, 5:50pm

You can see our problems in trying to diagnose fragments of models out of context!

Other than writing a custom function in C++, I don’t know how to easily make that faster.

You could start the sum at theta[1] * M[1] and then loop over 2:k, but that’s in the noise if k = 800.

Rather than doing matrix sums, it’d be better to keep an array of each dimension (n by m) and then use sum() at the end—it’s a smaller expression graph. But compared to everything else going on in the multiply, it’s probably not worth the time or space savings.

We don’t have matrix sum(matrix[]) for a sequence of matrices.

Topic		Replies	Views
Speeding up for loop with matrix - vector multiplication General	2	160	December 15, 2023
Vectorization of real valued parameter Modeling	8	491	January 9, 2020
Increasing Stan efficiency by vectorizing for loop Modeling	6	498	October 9, 2022
Stan SIMD & Performance Algorithms	23	4020	January 23, 2020
Stanc3 question: Vectorization of loops Developers	6	474	January 28, 2020

Vectorization of sum of real*matrix terms - any ideas?

Related Topics