Hi all,
I am working on age dependent branching processes where we have recently cast the resultant renewal equations into a simple numerical solving algorithm relying on element wise multiplications of three matrices and a row sum. These matrices are not huge, maybe n=500 is the largest case.
After timing my code, the loop doing the heavy lifting is as follows. Essentially, you have three matrices, A, B and C, and you populate the B matrix iteratively in a loop. For now assume A and C as fixed data. A barebones code block is as follows:
(note a) I do some matrix indexing already to help speed things up and not waste computation b) ignore for now that i starts from 3)
transformed parameters {
matrix[N,N] B = rep_matrix(0,N,N);
vector[N] convolution=rep_vector(0,N);
vector[N] E_inc=rep_vector(0,N);
for(i in 3:N){
convolution = (revmat(C[1:N,1:(i1)]).*A[1:N,1:(i1)].*B[1:N,1:(i1)])*rep_vector(1.0,(i1));
B[1:N,i] = rep_vector(1,N) + convolution;
}
E_inc = diagonal(B);
}
There is a function i have coded called revmat which simply reverses the columns.
matrix revmat(matrix X){
int n = rows(X);
int n2 = cols(X);
matrix[n,n2] s;
for(i in 1:n2){
s[1:n,i] = X[1:n,n2i+1];
}
return s;
}
This loop, even for modest N=100 is impractically slow and would take weeks to generate even small samples. Its also pretty slow ‘optimizing’ too, but that just about works in a sensible time. My questions are:

Is there an obvious thing I am doing wrong that is slowing things down? After combing this forum I have seen best practice is to vectorise and thats what i’ve tried to do.

Mathematically, I could cast the problem as three nested loops instead of as a matrix operation  would this work better in stan?

For reference, in base R the above computation for N=500, runs in 0.250.3 seconds.
Ive always thought modern computers excel at element wise multiplications and row sums, so the slowness was a surprise to me, but this probably exposes my general ignorance than anything meaningful.
I very much appreciate any suggestions
Sam
p.s the paper behind what Im trying to do is here [2107.05579] Unifying the effective reproduction number, incidence, and prevalence under a stochastic agedependent branching process where I use stan in optimizing mode.