Speed of Hadamard products vs for loops

Hello to every one! I am trying to optimize some code, but have a hard time assessing what direction to take. In particular I have a hard time understanding if in rstan it is more efficient at for loops or Hadamard products of matrices.

In recent years I have coded mostly in Python, hence I have imprinted in me the rationale to avoid for loop at all cost. In my code I have to calculate the bernoulli probability of a matrix A of binary values, and hence rather than looping trough its values, I created a user-defined function that calculates the probability for the whole matrix using basically two Hadamard products between matrices of the same dimension of the data matrix A to do so.

Is such an approach usually faster than the for loop, or I am actually going to be slower?

Would this situation change if in this matrix I have many missing values of which I don’t want to calculate the likelihood(let us say roughly 90%), hence rather than doing the Hadamard products, I could just loop trough a list which has just the indices of the non Nan values?

Best,
Luca

The fastest way would just be to use Stan’s elementwise multiply operator A .* B

1 Like

Could I get a little insight into why this would be the case, since to my understanding STAN then uses C++ in the background? Would this also be the case for the sparse case I was mentioning or there it could make sense?

Could I get a little insight into why this would be the case, since to my understanding STAN then uses C++ in the background?

For every function called in Stan while calculating gradients we need to setup a callback that is used for the reverse pass of reverse mode autodiff. If you do .* on the whole matrix that only requires 1 callback while if you do a loop that requires setting up n*m callbacks.

Would this also be the case for the sparse case I was mentioning or there it could make sense?

It really depends on your sparsity pattern. If you have a few percent of your matrix that is nonzero then the loop will be faster, but once it starts getting near half only being nonzero then the .* will probably be faster

Thank for the quick and detailed response!