I don’t think there is? Recoding a and c using the matrix type would let you use partial indexing i.e. c[, j] = a[, j] .* b; and 1 for-loop. Alternatively, re-type a and c as vector[n_row] {{a | c}} [n_col]; and then you can use 1 for-loop for c[j] = a[j] .* b;, but there’s probably a reason you’re not doing so already.

just that I ultimately need to access and work with the rows of c, and I think I remember that this is supposed to be more efficient. In my use case, n_col is just 4, but n_row is about 5e+05, and ‘b’ is a parameter so the whole operation needs to happen at every gradient evaluation.

Right now I’m just doing

for(i in 1:nrow){
c[i] = a[i]*b[i]
}

Any advice about whether this is likely to be substantially slower than re-typing as you suggest (to loop over 4 elements rather than 500K elements) but then extracting the columns (rather than the rows) would be much appreciated!

Intuitively, I think that smaller loops with larger vectors will be faster. Probably worth doing some tests with a small subset of the data to see if this is correct.

Did you mean to do element wise multiplication (.* vs *) here?

Can you just do a single index like the below? Then your filing each element in c with a vector * scalar multiplication

data {
int n_row;
int n_col;
}
transformed parameters {
row_vector[n_col] a[n_row];
vector[n_row] b;
row_vector[n_col] c[n_row];
for (i in 1:n_row) {
c[i] = a[i] * b[i];
}
}

Is there a reason you can’t just use a matrix here though?

data {
int n_row;
int n_col;
}
transformed parameters {
matrix[n_row, n_col] a;
vector[n_row] b;
matrix[n_row, n_col] c = diag_pre_multiply(b, a);
}

c = diag_pre_multiply(b, a) does the equivalent of c = as_diagonal(b) * a;. And since you are multiplying each cell of b by it’s corresponding column in a I think they are the same? I’d double check that though. Otherwise check out diag_post_mulitply()

I had hoped that I could vectorize over this (very long) for-loop while preserving the row_vector structure of c, but it sounds like this isn’t easily achieved. The reason for this is that in a later loop I need to access the rows of c, and I think I remember from somewhere that it’s supposed to be faster to get the rows from a row-vector structure. So I’ll need to tinker a little bit to figure out whether typing as matrices and vectorizing with diag_pre_multiply (great tip @stevebronder! ) or by looping over the 4 columns rather than the 500K rows (great tip @hhau!) results in any savings.

Stan stores it’s data in column major order so I think the pre_diag_multiply() is gonna be your best bet. In general the sooner you can defer to vectorized Stan stuff the better as it will jump straight to a C++ function that’s specialized for that operation