Hi,
I apologize in advance for the very naive question. As part of my
Stan code I have a matrix A and an array of matrices B defined in the data
section as follows:
matrix[90000, 30] A;
array[30] matrix[90000, 12] B;
I also have a vector of 12 parameters called beta:
vector[12] beta;
I am currently performing the following multiplication in a for loop:
for(k in 1:30) {
Y = A[,k] - B[k] * beta;
}
Is there a more efficient/fast way of performing this calculation?
Thank you for entertaining my very beginner question.
If you have access to a cluster or a large number of CPU, you can compile this with open mpi in cmdstan:
make STAN_THREADS=TRUE my_folder/my_model
functions {
my_function(array[] matrix B,vector beta) {
for (k in 1:30) {
C[,k] = B[k]*beta;
}
return C;
}
my_partial_sum(array[] matrix slice_n_B, vector beta) {
return my_function(slice_n_B,beta);
}
}
model {
C = reduce_sum(my_partial_sum,B,1,beta);
Y = A - C;
}
If you don’t have access to one, you can at least pull A out of the loop:
model {
for (k in 1:30) {
C[,k] = B[k]*beta;
}
Y = A - C;
}
I think the parallelized code should work, but it is not tested/debugged.
Thank you so much for the suggestions Corey.Plate. I actually do have access to a cluster and I am just starting to try and figure out how to implement reduce_sum, so I will definitely try your suggestion and report back. However, before turning to reduce_sum I wanted to make sure that there was nothing else I could do to make my code as efficient as possible to begin with (e.g., by doing things like pulling A out of the loop, as you also suggested). My profiling shows that this for loop is a clear bottleneck in my code so I figured I would ask. Thanks again for responding!