Sum_reduce with multilevel VAR-model

Hi!

I’m trying to use the new new sum_reduce feature with a (for now) bivariate VAR(1) model, but I can figure out how to add my data, means, AR- and cross-lagged parameters properly.

I’m trying to add vectors of everything. The data is vector[K] Y[N], so a column for each dependent variable Y, and N rows. The individual means similarly are vector[K] alpha[N]. The AR-parameters and cross-lagged parameters used to be in a vector[K*K] betas[N], so that I could pull individual matrices, but I put those is a vector with N rows as well so that the slicing should work. I now change the vector back to a matrix within my partial_sum function. Finally, the lagged predictor is also vector[K] Ylag[N], just like the DV. So in short,

data{
vector[K] Y[N];
vector[K] Ylag[N];
}

parameters{
vector[K] alpha[N]; // the means for the two DV belonging to each row. Since I have multiple rows per individual values are repeated here.

vector[K*K] beta[N]; // the AR and cross-lagged predictors for each individual. Again if multiple row for an individual, the values are repeated
}

Than my function to use with sum_reduce is,

functions {
  real part_sum(vector[] y_slice,
                int start, 
                int end,
                vector[] alpha,
                vector[] beta,
                vector Ylag,
                int K,
                matrix Omega,
                vector tau
                ) {
  return multi_normal_lpdf(y_slice | alpha[start:end] + to_matrix(beta[start:end], K, K)*Ylag[start:end]', quad_form_diag(Omega, tau));
  }
}

But this doesn’t work. I already tried changing the types of my parameters, but can’t get it to work. I think it had to do with the part where I multiply my beta matrix with the lagged Y values. The transpose option is not allowed there I think.

I hope you guys can help me out ;).

Thanks!
Joran

EDIT: @maxbiostat edited this post to include syntax highlighting.

First, it’s reduce_sum not sum_reduce.

What specifically does not work?

Apologies!

I typed the message a bit to quickly…I could get it to work and was really tired ;). The problem is that I’m struggling to combine the means and the autoregressive effects in an appropriate manner. I’ll try to clearify using annotated version of my code:

functions {
real part_sum(
vector[] y_slice, // a vector[K] Y[N], so basically a N by K matrix with the observations on my K DV's, where N is number of individuals (I) times number of observations per individual (T)
int start, 
int end,
vector[] alpha_Vec, // basically a N by K matrix with the individual means on my DV's. Since multiple rows belong to the same individual, each set of means is repeated T times.
vector[] beta_Vec, // basically a N by KxK matrix with the individual AR- and cross-lagged parameters*
vector[] Yc_Vec, // the lagged version of y_slice used as a predictor in the AR model*
matrix Omega, // KxK Correlation matrix
vector tau // vector[K] of sd's for the DV's
                ) 

The problem I’m having is in combining the means and AR-parameters in the multi_normal call. Normally in a VAR model you would have a [K,K] matrix of AR- and cross-lagged parameters for each individual, that you multiply with their [K,1] matrix of lagged effects. But how do I do this for multiple individual, so how do I vectorize?
I’ve tried to write out the multiplications. for example:

2x2 matrix A, times 2x1 matrix B, results in A1*B1 + A3*B2 + A2*B1 + A4*B2. And each of the 4 terms in this sum ({A1*B1}, {A3*B1}, etc) is basically multiplying vectors, which seems to me to be what I want :). I’ve tried to do this in the code like this:

{
return multi_normal_lpdf(y_slice | (alpha_Vec[start:end] + 
(beta_Vec[start:end,1] .* Yc_Vec[start:end,1]) + 
(beta_Vec[start:end,2] .* Yc_Vec[start:end,2]) + 
(beta_Vec[start:end,3] .* Yc_Vec[start:end,1]) + 
(beta_Vec[start:end,4] .* Yc_Vec[start:end,1])), 
quad_form_diag(Omega, tau));
  }
}

But then I get the following error when compiling:

*Ill-typed arguments supplied to infix operator .\*. Available signatures: 
(vector, vector) => vector
(row_vector, row_vector) => row_vector
(matrix, matrix) => matrix
Instead supplied arguments of incompatible type: real[], real[].*

So it seems to think I’m selecting single values or something but I don’t reallt see how to code around this… I hope this is clear :). I’m not great at formatting these messages ;).

Best!
Joran

Update: And as a follow up, I tried reading in my parameters and lagged predictors in separate vectors, and change the code to:

Blockquote
{
return multi_normal_lpdf(y_slice | alpha_Vec[start:end] +
append_col(beta_Vec1[start:end] .* Yc_Vec1[start:end], beta_Vec4[start:end] .* Yc_Vec2[start:end]) +
append_col(beta_Vec2[start:end] .* Yc_Vec1[start:end], beta_Vec3[start:end] .* Yc_Vec2[start:end]), quad_form_diag(Omega, tau));
}

Here I append the vectors I get after 1) multiplying the AR-parameter for Y1 with lagged Y1 and the AR-parameter for Y2 with lagged Y2 (resulting in a NxK matrix), and 2) the vectors I get after multiplying the cross-lagged predictors with the lagged variables (also resulting in a NxK matrix). This appending means that I have a vector[K] alpha[N], so basically a NxK matrix of means for the K DV’s, to which I want to add two other NxK matrices (the two I created above). Adding all this up should lead to ONE combined NxK matrix with the appropriate means for each of the N-rows, but…
multi_normal_lpdf doesn’t take matrices (I need a vector), and you can’t add a vector and a matrix. So I guess I need to know how to add the vectorized products of the lagged predictors and the AR- and cross-lagged parameters, while also turning them into a vector.

Allright, I think things may have become a little unclear due to the code snippets above. Here is my complete code to use sum_reduce on a VAR(1) model. It runs, but I don’t see large speed gains yet. So two questions:

  1. Is there something that I can do to speed-up the code (or is there a stupid mistake ;))
  2. With 200 individuals and 70 measures per individual (so 14000 rows in total), should I expect speed benefits from sum_reduce? Or is the overhead killing any potential gains?

Thanks!
Joran

// The reduce_sum function
functions {
  real part_sum(vector[] y_slice,
                int start, 
                int end,
                vector[] mu,
                matrix Omega,
                vector tau
                ) {
  return multi_normal_lpdf(y_slice | mu[start:end], quad_form_diag(Omega, tau));
  }
}
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // The data with I*T (=N) observations on K variables
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. I want to
  // remove the first observations sinc ethose don't have alagged predictor value (nothing came before)
  int grainsize;
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  corr_matrix[K] Omega; // Correlation matrix for correlations between residuals
  vector[K*K] beta_hat_location; // a K*K matrix of grand-mean (across-individual) AR-parameters (on the diagonal)  
  // and cross-lagged parameters.
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's for AR and Cross-lagged effects
  vector[K] alpha_hat_location; // grand-mean (across-individuals) on the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd of the means 
  corr_matrix[K] Omega_alpha; // correlations between means
  corr_matrix[K*K] Omega_beta;// correlations between AR- and cross-lagged effects
  
   
  // individual-level parameters. 
  vector[K*K] z_beta[I]; // use for the non-centered parametrization
  vector[K] z_alpha[I]; // use for the non-centered parametrization
}

transformed parameters {
  matrix[K, K] beta[I]; // Individual VAR(1) coefficient matrices (with individual AR- and cross-lagged effects)
  vector[K] alpha[I]; // Individual means on the K DV's
  matrix[K, K] L_alpha; // cholesky decomposition of correlation matrix of the means
  matrix[(K*K), (K*K)] L_beta; // cholesky decomposition of correlation matrix of the AR/Cross-lagged effects
  L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale));
  L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
  
  for(i in 1:I) {
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i];// non=centered parametrization
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]), K, K); // non=centered parametrization
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  Omega_alpha ~ lkj_corr(3);
  Omega_beta ~ lkj_corr(5); 
  
  // hierarchical priors
  for(i in 1:I) {
    // non-centered parameterization
    z_alpha[i] ~ normal(0, 1);
    z_beta[i] ~ normal(0, 1);
    }
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  Omega ~ lkj_corr(3);
  
  

  {
  vector[K] Yc[N]; // Create centered versions of the K DV's to use as lagged predictors
  vector[K] Mean[N]; // Create a vector array in which to store the product of 1) the matrix of AR- (on diagonal)
  // and cross-lagged effects with 2) the vector of lagged DV-scores. I can't figure out how to do this in a 
  // reduce_sum or map_rect function so I just create it outside of it.
  
  vector[K] Y_Vec[N-I]; // These two vectors are the same as Yc and Mean directly above, but after removing the
  // first row of each individual. Since there are no prior observations for that row, I can;t use it in the
  // VAR(1)
  vector[K] Mean_Vec[N-I]; 
  
  
 
  // likelihood
  
  for(n in 1:N) {
    
    Yc[n] = Y[n] - alpha[individual[n]]; // Center the observed scores using person means (group-mean centering)
    // this makes the alpha equal to individual means
    Mean[n] = Yc[n]; // First make all Mean values equal to centered DV scores, than update the values for 
    // individual 2nd and later observations using the loop below
       
    if(time[n]>1){
      
    Mean[n] = alpha[individual[n]] + beta[individual[n]]*Yc[n-1]; // Use individual means and AR-/Cross-lagged effects
    // to calculate the predicted mean in each row
    }
  }
  
  // Remove first observations of each participant.  Due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
   target += reduce_sum(part_sum,Y_Vec,grainsize,Mean_Vec,Omega,tau); // Feed DV's, Mean, Correlation matrix and
   // sd's to reduce_sum function
  
  }
 
}

It‘s still reduce_sum ! NOT sum_reduce.

You do not need to loop over the z‘s for the prior, just make to_vector(z) ~ std_normal().

Other than that, you should move as much of the model into the reduce_sum not just the data likelihood.

See https://statmodeling.stat.columbia.edu/2020/05/05/easy-within-chain-parallelisation-in-stan/ For an explanation why it is important to move most of the model into reduce_sum

Ahhh, man! I’m sorry, I named the function sum_recude initially, and that name is a little at the front of my mind apparently :). I’ll triple check from now on! ;).

And thanks for the advice! I’ll get on it and improve my reduce_sum function!

Really appreciate the help! And I’ll let you know how it turn out :).

Best,
Joran

Ok, I think it is slowly starting to click in my brain ;). Below is the code I have now, in which there is still one issue, but first I’d like to make sure that I’m getting the logic of reduce_sum.

I have multilevel VAR(1) data. Specifically, I have I individuals, each with T observations on K variables. Since I can view each individual as a separate “unit” in my analysis it would make sense to slice over individuals and analyze each individuals’ T*K array of observations separately using reduce_sum. Right? Obviously, the analyses in the separate slices are somewhat related since they use the same hyper-priors to draw individual values, but I can run the analyses for each individual separately.

Ok, then what I need to do is supply an array to slice over. Am I right to think that this would mean supplying an I*T*K array here? That way, reduce_sum would slice over the first dimension (the individuals, I), and send the separate T*K arrays of each individual to a separate slice/process/whatever the appropriate term is ;).

Finally, I need to write a code that tells STAN what to do with the data that is present in each slice, so what to do with the T*K array of each individual. Am I right that this code just needs to “tell” what to do in each separate slice, assuming that all that is there is aT*K array? So am I right that I should write the code assuming the slicing already took place? For example, can I write code that “only” draws K means for example? Since each individual only has Kmeans. Or should I let the code draw a I*K array of means? So a set of means for each individual in my sample, even though each slice should only use the data of one of the individuals.

Allright, those were my general questions :). Below is my current (annotated) code with a specific questions about it. The codes assumes that things work as outlined above :).

// The reduce_sum function
functions {
  real part_sum(real[,,] y_slice, // array to slice over. I have multilevel data with I individuals, each with T observations on K variables, s0
                // I*T*K
                int start, 
                int end,
                matrix Omega, // Hyper-prior for residual-correlations
                vector tau, // Hyper-prior for residual sd's
                vector alpha_hat_location, // Hyper-prior for the means of the K variables
                matrix Omega_alpha, // Hyper-prior for correlation between the means of the K variables
                vector alpha_hat_scale, // Hyper-prior for yhe sd's of the K variables
                vector beta_hat_location, // Hyper-prior for the means of the AR- and cross-lagged parameters
                matrix Omega_beta, // Hyper-prior for the correlations between the means of the AR- and cross-lagged parameters
                vector beta_hat_scale, // Hyper-prior for the sd's of the AR- and cross-lagged parameters
                int K, // Number of variables
                int I, // Number of individuals
                int T // Number of observations per individual
                ) {
 // I have observations nested within individuals, so I want to slice over individuals. Specifically, I have I individuals, each with  T 
 // observations on K variables. This means that y_slice should be build-up so that each element is the T*K matrix/array of individuals T 
 // observations on the K variables.
 // I'm assumming that the code I provid in this function is run for each slice (here person) separately! Which means I only have to code what 
 // should happen for each separate individual, not across individuals
  matrix[K, K] beta; // Individual VAR(1) coefficient matrices (with individual AR- and cross-lagged effects). Dawn separately for each 
  // individual in their correspoinding slice
  vector[K] alpha; // Individual means on the K DV's. Also drawn separately for each individual in their respective slice.
  matrix[K, K] L_alpha; // cholesky decomposition of correlation matrix of the means
  matrix[(K*K), (K*K)] L_beta; // cholesky decomposition of correlation matrix of the AR/Cross-lagged effects
  vector[K] Mean[T]; // The predicted mean for each row of the array in a slice. Calculated by multiplying the AR- and cross-lagged
  // predictors with lagged versions of the DV, and then adding the individuals mean-score on the K DV's
  vector[K] Ylag[T]; // The lagged predictor for the VAR(1) model.
  matrix[K,K] z_beta; // use for the non-centered parametrization
  vector[K] z_alpha; // use for the non-centered parametrization
  real lp_alpha[K]; // used for the draws of Z_alpha
  real lp_beta[K*K];// used for the draws of Z_beta



  
  // Draw the K means for individual in slice. Using non-centered specification
  for(i in 1:K){
  lp_alpha[i] = std_normal_lpdf(z_alpha);
  }
  
  // Draw the AR- and cross-lagged parameters in slice. Using non-centered specification
  for(j in 1:(K*K)){
  lp_beta[j] = std_normal_lpdf(to_vector(z_beta));
  }
  
  // cholesky decomposition. Hyper-priors for correlation-matrix and sd's given as input to function
  L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale));
  L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
  
  //Draw means and VAR-parameters for individual belonging to current slice
  alpha = alpha_hat_location + L_alpha * z_alpha;// non=centered parametrization
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
  beta = to_matrix((beta_hat_location + L_beta * to_vector(z_beta)), K, K); // non=centered parametrization
    
  
  // This part I cant figure out. I want to provide a I*T*K array as y_slice, that contains for each of the I individuals their T observations on 
  // the K DV's. The idea is to slice of the I's, so each slice gets T*K array to work with. But how do I provide this I*T*K array, and how do I
  // get the function to slice it properly? I assume slicing goes over the first dimensions, so here I. This means that the y_slice used in each
  // slice is T*K. However, Ylag is T*K, which mean Ylag[1] should be a row-vector of dimension K. If I try to assign y_slice[1] to that it 
  // doesn't work so clearly referencing y-slice mean referencing the array to be slices in its original, pre-sliced, dimensions. But then how 
  // do I make sure that y-slice is sliced properly (across individuals) when sampling from the multi_normal_lpdf
  Ylag[1] = to_vector(y_slice[1]) - alpha;
  
  for(n in 2:T) { // for the number of rows in the slice, center the lagged predictor
    
    Ylag[n] = to_vector(y_slice[n-1]) - alpha; // Center the observed scores in a slice using person means 
    // (group-mean centering) this makes the alpha equal to individual means.
    Mean[n] = alpha + beta*Ylag[n-1];
    }
    
 return multi_normal_lpdf(y_slice | Mean, quad_form_diag(Omega, tau)); 

 }
}

data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  real Y[N, K]; // The data with I*T (=N) observations on K variables
  int grainsize;
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  corr_matrix[K] Omega; // Correlation matrix for correlations between residuals
  vector[K*K] beta_hat_location; // a K*K matrix of grand-mean (across-individual) AR-parameters (on the diagonal)  
  // and cross-lagged parameters.
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's for AR and Cross-lagged effects
  vector[K] alpha_hat_location; // grand-mean (across-individuals) on the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd of the means 
  corr_matrix[K] Omega_alpha; // correlations between means
  corr_matrix[K*K] Omega_beta;// correlations between AR- and cross-lagged effects
  
   
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  Omega_alpha ~ lkj_corr(3);
  Omega_beta ~ lkj_corr(5); 
  

 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  Omega ~ lkj_corr(3);

 // Build y_slice
 {
 real y_slice[I,T,K];
 
 for(p in 1:I){
 y_slice[p] = Y[((p-1)*T):p*T];   
 }
 }  
 
  target += reduce_sum(part_sum,y_slice,grainsize,Omega,tau,alpha_hat_location, Omega_alpha, alpha_hat_scale, beta_hat_location, 
                        Omega_beta,beta_hat_scale, K, I, N, T); // Feed DV's, Mean, Correlation matrix and
   
  }
}

What I can’t figure out is how to use the y_slice properly. As mentioned above, I assume that the code should only tell what to do for each slice separately so assuming that only the data for a specific individual is there (since I slice over individuals). I provide a I*T*K array to slice over and so assume that the code should deal with T*K arrays as the first dimensions get cut off due to slicing. However, when I want to lag the Tobservations of each individual I run into a problem. In the function I create a T*K array to store the lagged values in (vector[K] Ylag[T];). Using Ylag[1] should then result in vector of size K. As said, I assume that y_slice for a specific slice is also a T*K array, so I should be able to do Ylag[1] = y_slice[1]; , but this doesn’t work. Apparantly, y_slice is still I*T*K when referenced in the code, so how do I make sure that the right elements go to a shard? How do I make sure the first slice gets the first T*K observations, and the second the second set of T*K observations, etc.

Best,
Joran

And as a follow up question to the end of my above post :). If y_slice is always the original I*T*K array in the code, am I completely wrong about only having to write code in the function “pretending” I’m writing it for one individual only? So can I really just right code that draws parameter values for ONE individual as I’m doing now, or should I draw a set of individual values (with values for each individual) and than distribute those values in someway? That last options seems counter intuitive because it makes sense to me if all sampling related to the data of a slice (for me, data of ONE individual) takes place separately in the corresponding slice :).

Ok, I hope the above questions and this one are clear ;). Thanks in advance for any help! :).

You should really start with a simpler example, I think.

reduce_sum is best used if you use vectorised expressions which calculate the partial sums needed efficiently (over many observations).

It looks as if you moved the prior defs into reduce_sum… that will make these being evaluated many times which is not what you want. Sorry if I was unclear on that - move everything you can into the reduce_sum, but don’t break the model, of course.

Ahhh, right! Got it I think ;). Thanks!

Yeah, on https://statmodeling.stat.columbia.edu/2020/05/05/easy-within-chain-parallelisation-in-stan/ it said that normally what you would have in the transformed parameters and the model block would go into reduce_sum, but that’s because those functions usually can be run on vectors (for example by giving a vector to a normal_lpdf). So, am I wrong in thinking that you would/should/could slice over hierarchical units? Do you “just” provide a vector to reduce_sum in which data from all units are combined? Because that would mean, that I just check which vectorized operations I have and move all of those into reduce_sum right? If that’s the case, do you think I’ll have any speed advantage with this model/sample size?

Best,
Joran

No. You should do that.

Ahhh, at least I got that right! :D. Progress! :P.

Thanks!!

Hi @wds15!

I’m slowly building up simpler models and testing those, I’ll keep you posted on what I bump into and to check I’m slicing properly ;). But at the same time I’m trying to vectorize as much as possible (as reduce_sum works best with vectorized input) and I have a question on that. Below is the code that I want to parallel-ize (but this is the non-parallel version for simplicity).

As some background, I have data on I individuals. Each is measured T times, resulting in N (I*T) rows of data. I want to fit a VAR(1) model to this. so each of my K (K > 1) variables is regressed on it’s own value at the previous measurement, as well as on the previous value of all other variables. To get the predicted scores for each measurment (each row), you first have to take the individual means on the K variables for the person that row belongs to (a 1*K vector). Then you multyply that individuals’ K*K matrix of VAR(1) parameters (K AR-parameters on the diagonal, and cross-lagged effects on the off-diagonal) with his/her K*1 matrix/vector of previous values on the K variables, and add the result to the vector of means. You than repeat this for each row.

Now, I want to to this without looping over each row. So I want 1) a vector that has all the individuals means for each row (which will have the same values in several rows, since several rows belong to the same person), and 2) a vector of the product of the individual K*K matrices with the vector of previous scores on the K-variables.
My plan for this was to first “spread” I (the number of individuals) sets of individual means over a vector for N rows (repeating eeach individual’s pair of means multiple times, once for each time he/she was measured). I think I managed to do this. Second, I had to tackle the matrix multiplication. I figure that I should not work with individual matrices, because that was to complicated. Instead I want to store the K AR-parameters and the cross-lagged parameters separately. The idea is to turn the multiplication of a matrix with a vector into a set of vector-multiplications. After all if I have 2x2 matrix A ={A1,A2,A3,A4}, and 2x1 vector B={B1,B2}, this results into 2x1 vector C ={(A1xB1+A2xB2) + (A3xB1+A4xB2)}, and the elements in vector C are each basically the sum of two vector multiplications.

Ok! Now the code, and than what doesn’t work :P.

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  corr_matrix[K] Omega; // Correlation matrix for residuals
  //cov_matrix[K] Sigma;
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  corr_matrix[K] Omega_alpha;// Correlations between means of predictors
  corr_matrix[K*K] Omega_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  matrix[K, K] L_alpha;
  matrix[(K*K), (K*K)] L_beta;
  L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale));
  L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
  
  for(i in 1:I) {
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  Omega_alpha ~ lkj_corr(3);
  Omega_beta ~ lkj_corr(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  Omega ~ lkj_corr(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  // get diag and off diags for beta
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  // this part I can't figure out!
  Xc = X - alpha_vec;// X is vector[K] X[N], a N*K array of the lagged predictors for each row. I want to subtract the individual means
  // belonging to a row, from the lagged predictor value belonging to that row. And I want to do that in one go, instead of looping over rows.
  // Why doesn't this work?
  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // (vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors, but this doesn't work.
  Mean[,1] = alpha_vec[,1] +ar1_vec .* Xc[,1] +cl1_vec .* Xc[,2];
  Mean[,2] = alpha_vec[,2] +ar2_vec .* Xc[,2] +cl2_vec .* Xc[,1];
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
   // No Mean values for time=0 
   Y_Vec ~ multi_normal(Mean_Vec, quad_form_diag(Omega, tau));
  
  }
 
}

What I cant get to work are 2 pieces of code. First:

Xc = X - alpha_vec;// X is vector[K] X[N], a N*K array of the lagged predictors for each row. I want to subtract the individual means
  // belonging to a row, from the lagged predictor value belonging to that row. And I want to do that in one go, instead of looping over rows.
  // Why doesn't this work?

Here I try to center all N lagged predictor values in one go, by subtracting the N*K array alpha_vec (which hold all individual means for each row) from N*K array X. Why doesn’t this work? I can’t use the - operator.

And second, this part:

  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // (vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors, but this doesn't work.
  Mean[,1] = alpha_vec[,1] +ar1_vec .* Xc[,1] +cl1_vec .* Xc[,2];
  Mean[,2] = alpha_vec[,2] +ar2_vec .* Xc[,2] +cl2_vec .* Xc[,1];

This is my vectorized version of the matrix multiplication I mention above, but this also doesn’t work. I can’t use the .* and + operators…

Thanks for the help! :).

Wait!! Got it!

Although I think I don’t need the to_vector statement around X and alpha_vec. But this should get rid of all loops that I can. I need the loop on alpha and beta because I want the output to give I values of those, not N :).

  // this part I can't figure out!
  Xc[,1] = to_array_1d(to_vector(X[1:N,1])- to_vector(alpha_vec[1:N,1]));
  Xc[,2] =to_array_1d(to_vector(X[1:N,2])- to_vector(alpha_vec[1:N,2]));
  //Xc = X - alpha_vec;// X is vector[K] X[N], a N*K array of the lagged predictors for each row. I want to subtract the individual means
  // belonging to a row, from the lagged predictor value belonging to that row. And I want to do that in one go, instead of looping over rows.
  // Why doesn't this work?
  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // (vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors, but this doesn't work.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));

I’ll do a test run and then add this to the reduce_sum function. I’ll keep you posted ;)

Hi!

Below is the full code for my multilevel VAR(1) model in which I eliminated as many loops and vectorized as much as I can see is possible :). I kept the loops for alpha[i] and beta[i], but that is because I want only I (number of participants) values for those in my output :).

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  corr_matrix[K] Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  corr_matrix[K] Omega_alpha;// Correlations between means of predictors
  corr_matrix[K*K] Omega_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  matrix[K, K] L_alpha;
  matrix[(K*K), (K*K)] L_beta;
  L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale));
  L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
  
  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  Omega_alpha ~ lkj_corr(3);
  Omega_beta ~ lkj_corr(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  Omega ~ lkj_corr(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Y_Vec ~ multi_normal(Mean_Vec, quad_form_diag(Omega, tau));
  
  }
 
}

This code took about 4 minutes of the time needed to construct Mean by looping over all 14000 rows. so I guess looping 14000 times takes 4 minues ;).

I’m now slowly parallel-izing this using reduce_sum. Starting simple by only moving the final sampling from the multivariate normal into the reduce_sum function. I’ll play around with the grainsize there, and slowly move more into the code that is send to the slices. So the first code for parallelized multilevel VAR(1) is:

// The reduce_sum function
functions {
  real part_sum(vector[] y_slice,
                int start, 
                int end,
                vector[] mu,
                matrix Omega,
                vector tau
                ) {
  return multi_normal_lpdf(y_slice | mu[start:end], quad_form_diag(Omega, tau));
  }
}
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // The data with I*T (=N) observations on K variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. I want to
  // remove the first observations sinc ethose don't have alagged predictor value (nothing came before)
  int grainsize;
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  corr_matrix[K] Omega; // Correlation matrix for correlations between residuals
  vector[K*K] beta_hat_location; // a K*K matrix of grand-mean (across-individual) AR-parameters (on the diagonal)  
  // and cross-lagged parameters.
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's for AR and Cross-lagged effects
  vector[K] alpha_hat_location; // grand-mean (across-individuals) on the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd of the means 
  corr_matrix[K] Omega_alpha; // correlations between means
  corr_matrix[K*K] Omega_beta;// correlations between AR- and cross-lagged effects
  
   
  // individual-level parameters. 
  matrix[I,K*K] z_beta;// use for the non-centered parametrization
  matrix[I,K] z_alpha; // use for the non-centered parametrization
}

transformed parameters {
  matrix[K, K] beta[I]; // Individual VAR(1) coefficient matrices (with individual AR- and cross-lagged effects)
  vector[K] alpha[I]; // Individual means on the K DV's
  matrix[K, K] L_alpha; // cholesky decomposition of correlation matrix of the means
  matrix[(K*K), (K*K)] L_beta; // cholesky decomposition of correlation matrix of the AR/Cross-lagged effects
  L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale));
  L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
  
  for(i in 1:I) {
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';// non=centered parametrization
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K); // non=centered parametrization
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  Omega_alpha ~ lkj_corr(3);
  Omega_beta ~ lkj_corr(5); 
  
  // hierarchical priors
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  Omega ~ lkj_corr(3);
  
  
  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // (vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  target += reduce_sum(part_sum,Y_Vec,grainsize,Mean_Vec,Omega,tau); // Feed DV's, Mean, Correlation matrix and
   
  
  }
 
}

Another thing that will help speed up the model is parameterizing in the form of multi_normal_cholesky.

I posted a hierarchical VAR in that form where I ask about how best to weight the correlation matrices in that form at Weighted correlation matrices and cholesky_factor_corr . The original non-cholesky form was from james savage’s blog and you can check that out at https://khakieconomics.github.io/2016/11/27/Hierarchical-priors-for-panel-vector-autoregressions.html.

Hey thanks!!

I’ll incorporate that! :).

Best!
Joran

Good catch. The cholesky version uses analytical gradients whereas the one currently used uses autodiff. So use the cholesky variant here.

Ok! Updated code below :)… I’ll run the code to test for speed-up tomorrow :). Thanks guys!

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  //cholesky_factor_corr[K] Omega;
  corr_matrix[K] Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  corr_matrix[K] Omega_alpha;// Correlations between means of predictors
  corr_matrix[K*K] Omega_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  matrix[K, K] L_Omega;
  matrix[K, K] L_alpha;
  matrix[(K*K), (K*K)] L_beta;
  L_Omega = cholesky_decompose(quad_form_diag(Omega, tau));
  L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale));
  L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
  
  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  Omega_alpha ~ lkj_corr(3);
  Omega_beta ~ lkj_corr(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  Omega ~ lkj_corr(3);
  //Omega ~ lkj_corr_cholesky(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Y_Vec ~ multi_normal_cholesky(Mean_Vec, L_Omega);
  
  }
 
}

Just ran the code and it shaved an additional 8 minutes of the running time (without reduce_sum)! :).
Will run it with reduce_sum now :).

Thanks again :)