I’m working on a model where it is convenient to use a triangular matrix for calculation of different loglikelihoods for different observed data. I can then increment these loglikelihoods by the number of observations that correspond to each loglikelihood in the modelling step and sum them to obtain the total loglikelihood. It turns out this operation has a name: the Frobenius inner product.
For example, in my transformed parameters block I wind up with a T x T matrix with entries [LL_11, LL_12, LL_13, …, LL_1T] in the first row, [0, LL_22, LL_23, … , LL_2T] in the second row on down to the final row which is T  1 zeroes with LL_TT in the final position. In my transformed data block, I gather my data such that I have a matrix of counts: [n_11, n_12, …, n_1T], [0, n_22, … , n_2T], etc.
In pseudocode these are:
matrix[T, T] log_likeli;
matrix[T, T] the_Ns;
So my question is: what’s the most efficient way to calculate the Frobenius inner product, given that a little over half of my matrix consists of zeroes? I can think of three ways.

Sum the matrix resulting from elementwise multiplication:
target += sum(log_likeli .* the_Ns);

Vectorise to a row_vector and vector and multiply:
target += to_row_vector(log_likeli) * to_vector(the_Ns);

Transpose and loop over rows/columns omitting zero terms through vector slicing:
for (i in 1:T)
target += log_likeli[i,i:t] * the_Ns[i, i:t]’;
I’ll be doing some testing, but I’d curious to hear if this has already been considered and what the best way might be for those who have already walked down this path.