# Speeding up for loop with matrix - vector multiplication

Hi,
I apologize in advance for the very naive question. As part of my
Stan code I have a matrix A and an array of matrices B defined in the data
section as follows:

matrix[90000, 30] A;
array[30] matrix[90000, 12] B;

I also have a vector of 12 parameters called beta:

vector[12] beta;

I am currently performing the following multiplication in a for loop:

for(k in 1:30) {
Y = A[,k] - B[k] * beta;
}

Is there a more efficient/fast way of performing this calculation?
Thank you for entertaining my very beginner question.

If you have access to a cluster or a large number of CPU, you can compile this with open mpi in cmdstan:

`make STAN_THREADS=TRUE my_folder/my_model`

``````functions {

my_function(array[] matrix B,vector beta) {

for (k in 1:30) {
C[,k] = B[k]*beta;
}

return C;

}

my_partial_sum(array[] matrix slice_n_B, vector beta) {

return my_function(slice_n_B,beta);

}

}
``````
``````model {

C = reduce_sum(my_partial_sum,B,1,beta);

Y = A - C;

}
``````

If you donâ€™t have access to one, you can at least pull A out of the loop:

``````model {

for (k in 1:30) {
C[,k] = B[k]*beta;
}

Y = A - C;

}
``````

I think the parallelized code should work, but it is not tested/debugged.

Thank you so much for the suggestions Corey.Plate. I actually do have access to a cluster and I am just starting to try and figure out how to implement reduce_sum, so I will definitely try your suggestion and report back. However, before turning to reduce_sum I wanted to make sure that there was nothing else I could do to make my code as efficient as possible to begin with (e.g., by doing things like pulling A out of the loop, as you also suggested). My profiling shows that this for loop is a clear bottleneck in my code so I figured I would ask. Thanks again for responding!