I think one approach would be to broadcast the y vector to an (N-K) x K matrix, where each column contains the previous K lags of data, then the mu vector is a simple matrix multiplication.

So it would need one function that would take the number of observations and lag order, and return the indices that would comprise each row:

array[] int ark_inds(int N, int K) {
// Nested array containing the indices for the nth lag
array[(N-K), K] int inds_tmp;
for (n in 1:(N-K)) {
inds_tmp[n] = linspaced_int_array(K, n, n+(K-1));
}
// Concatenate nested arrays to a single array
return to_array_1d(inds_tmp);
}

Then the y vector can be broadcast using the to_matrix function with row-major ordering: