Hi,

I apologize in advance for the very naive question. As part of my

Stan code I have a matrix A and an array of matrices B defined in the data

section as follows:

matrix[90000, 30] A;

array[30] matrix[90000, 12] B;

I also have a vector of 12 parameters called beta:

vector[12] beta;

I am currently performing the following multiplication in a for loop:

for(k in 1:30) {

Y = A[,k] - B[k] * beta;

}

Is there a more efficient/fast way of performing this calculation?

Thank you for entertaining my very beginner question.

If you have access to a cluster or a large number of CPU, you can compile this with open mpi in cmdstan:

`make STAN_THREADS=TRUE my_folder/my_model`

```
functions {
my_function(array[] matrix B,vector beta) {
for (k in 1:30) {
C[,k] = B[k]*beta;
}
return C;
}
my_partial_sum(array[] matrix slice_n_B, vector beta) {
return my_function(slice_n_B,beta);
}
}
```

```
model {
C = reduce_sum(my_partial_sum,B,1,beta);
Y = A - C;
}
```

If you donâ€™t have access to one, you can at least pull A out of the loop:

```
model {
for (k in 1:30) {
C[,k] = B[k]*beta;
}
Y = A - C;
}
```

I think the parallelized code should work, but it is not tested/debugged.

Thank you so much for the suggestions Corey.Plate. I actually do have access to a cluster and I am just starting to try and figure out how to implement reduce_sum, so I will definitely try your suggestion and report back. However, before turning to reduce_sum I wanted to make sure that there was nothing else I could do to make my code as efficient as possible to begin with (e.g., by doing things like pulling A out of the loop, as you also suggested). My profiling shows that this for loop is a clear bottleneck in my code so I figured I would ask. Thanks again for responding!