Speeding up for loop with matrix - vector multiplication

Hi,
I apologize in advance for the very naive question. As part of my
Stan code I have a matrix A and an array of matrices B defined in the data
section as follows:

matrix[90000, 30] A;
array[30] matrix[90000, 12] B;

I also have a vector of 12 parameters called beta:

vector[12] beta;

I am currently performing the following multiplication in a for loop:

for(k in 1:30) {
Y = A[,k] - B[k] * beta;
}

Is there a more efficient/fast way of performing this calculation?
Thank you for entertaining my very beginner question.

If you have access to a cluster or a large number of CPU, you can compile this with open mpi in cmdstan:

make STAN_THREADS=TRUE my_folder/my_model

functions {

 my_function(array[] matrix B,vector beta) {

  for (k in 1:30) {
   C[,k] = B[k]*beta;
  }

  return C;

 }

 my_partial_sum(array[] matrix slice_n_B, vector beta) {

 return my_function(slice_n_B,beta);

 }

}
model {

 C = reduce_sum(my_partial_sum,B,1,beta);

 Y = A - C;

}

If you don’t have access to one, you can at least pull A out of the loop:

model {

 for (k in 1:30) {
  C[,k] = B[k]*beta;
 }

 Y = A - C;

}

I think the parallelized code should work, but it is not tested/debugged.

Thank you so much for the suggestions Corey.Plate. I actually do have access to a cluster and I am just starting to try and figure out how to implement reduce_sum, so I will definitely try your suggestion and report back. However, before turning to reduce_sum I wanted to make sure that there was nothing else I could do to make my code as efficient as possible to begin with (e.g., by doing things like pulling A out of the loop, as you also suggested). My profiling shows that this for loop is a clear bottleneck in my code so I figured I would ask. Thanks again for responding!