Sum_reduce with multilevel VAR-model

No. You should do that.

Ahhh, at least I got that right! :D. Progress! :P.

Thanks!!

Hi @wds15!

I’m slowly building up simpler models and testing those, I’ll keep you posted on what I bump into and to check I’m slicing properly ;). But at the same time I’m trying to vectorize as much as possible (as reduce_sum works best with vectorized input) and I have a question on that. Below is the code that I want to parallel-ize (but this is the non-parallel version for simplicity).

As some background, I have data on I individuals. Each is measured T times, resulting in N (I*T) rows of data. I want to fit a VAR(1) model to this. so each of my K (K > 1) variables is regressed on it’s own value at the previous measurement, as well as on the previous value of all other variables. To get the predicted scores for each measurment (each row), you first have to take the individual means on the K variables for the person that row belongs to (a 1*K vector). Then you multyply that individuals’ K*K matrix of VAR(1) parameters (K AR-parameters on the diagonal, and cross-lagged effects on the off-diagonal) with his/her K*1 matrix/vector of previous values on the K variables, and add the result to the vector of means. You than repeat this for each row.

Now, I want to to this without looping over each row. So I want 1) a vector that has all the individuals means for each row (which will have the same values in several rows, since several rows belong to the same person), and 2) a vector of the product of the individual K*K matrices with the vector of previous scores on the K-variables.
My plan for this was to first “spread” I (the number of individuals) sets of individual means over a vector for N rows (repeating eeach individual’s pair of means multiple times, once for each time he/she was measured). I think I managed to do this. Second, I had to tackle the matrix multiplication. I figure that I should not work with individual matrices, because that was to complicated. Instead I want to store the K AR-parameters and the cross-lagged parameters separately. The idea is to turn the multiplication of a matrix with a vector into a set of vector-multiplications. After all if I have 2x2 matrix A ={A1,A2,A3,A4}, and 2x1 vector B={B1,B2}, this results into 2x1 vector C ={(A1xB1+A2xB2) + (A3xB1+A4xB2)}, and the elements in vector C are each basically the sum of two vector multiplications.

Ok! Now the code, and than what doesn’t work :P.

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  corr_matrix[K] Omega; // Correlation matrix for residuals
  //cov_matrix[K] Sigma;
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  corr_matrix[K] Omega_alpha;// Correlations between means of predictors
  corr_matrix[K*K] Omega_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  matrix[K, K] L_alpha;
  matrix[(K*K), (K*K)] L_beta;
  L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale));
  L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
  
  for(i in 1:I) {
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  Omega_alpha ~ lkj_corr(3);
  Omega_beta ~ lkj_corr(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  Omega ~ lkj_corr(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  // get diag and off diags for beta
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  // this part I can't figure out!
  Xc = X - alpha_vec;// X is vector[K] X[N], a N*K array of the lagged predictors for each row. I want to subtract the individual means
  // belonging to a row, from the lagged predictor value belonging to that row. And I want to do that in one go, instead of looping over rows.
  // Why doesn't this work?
  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // (vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors, but this doesn't work.
  Mean[,1] = alpha_vec[,1] +ar1_vec .* Xc[,1] +cl1_vec .* Xc[,2];
  Mean[,2] = alpha_vec[,2] +ar2_vec .* Xc[,2] +cl2_vec .* Xc[,1];
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
   // No Mean values for time=0 
   Y_Vec ~ multi_normal(Mean_Vec, quad_form_diag(Omega, tau));
  
  }
 
}

What I cant get to work are 2 pieces of code. First:

Xc = X - alpha_vec;// X is vector[K] X[N], a N*K array of the lagged predictors for each row. I want to subtract the individual means
  // belonging to a row, from the lagged predictor value belonging to that row. And I want to do that in one go, instead of looping over rows.
  // Why doesn't this work?

Here I try to center all N lagged predictor values in one go, by subtracting the N*K array alpha_vec (which hold all individual means for each row) from N*K array X. Why doesn’t this work? I can’t use the - operator.

And second, this part:

  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // (vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors, but this doesn't work.
  Mean[,1] = alpha_vec[,1] +ar1_vec .* Xc[,1] +cl1_vec .* Xc[,2];
  Mean[,2] = alpha_vec[,2] +ar2_vec .* Xc[,2] +cl2_vec .* Xc[,1];

This is my vectorized version of the matrix multiplication I mention above, but this also doesn’t work. I can’t use the .* and + operators…

Thanks for the help! :).

Wait!! Got it!

Although I think I don’t need the to_vector statement around X and alpha_vec. But this should get rid of all loops that I can. I need the loop on alpha and beta because I want the output to give I values of those, not N :).

  // this part I can't figure out!
  Xc[,1] = to_array_1d(to_vector(X[1:N,1])- to_vector(alpha_vec[1:N,1]));
  Xc[,2] =to_array_1d(to_vector(X[1:N,2])- to_vector(alpha_vec[1:N,2]));
  //Xc = X - alpha_vec;// X is vector[K] X[N], a N*K array of the lagged predictors for each row. I want to subtract the individual means
  // belonging to a row, from the lagged predictor value belonging to that row. And I want to do that in one go, instead of looping over rows.
  // Why doesn't this work?
  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // (vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors, but this doesn't work.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));

I’ll do a test run and then add this to the reduce_sum function. I’ll keep you posted ;)

Hi!

Below is the full code for my multilevel VAR(1) model in which I eliminated as many loops and vectorized as much as I can see is possible :). I kept the loops for alpha[i] and beta[i], but that is because I want only I (number of participants) values for those in my output :).

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  corr_matrix[K] Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  corr_matrix[K] Omega_alpha;// Correlations between means of predictors
  corr_matrix[K*K] Omega_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  matrix[K, K] L_alpha;
  matrix[(K*K), (K*K)] L_beta;
  L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale));
  L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
  
  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  Omega_alpha ~ lkj_corr(3);
  Omega_beta ~ lkj_corr(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  Omega ~ lkj_corr(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Y_Vec ~ multi_normal(Mean_Vec, quad_form_diag(Omega, tau));
  
  }
 
}

This code took about 4 minutes of the time needed to construct Mean by looping over all 14000 rows. so I guess looping 14000 times takes 4 minues ;).

I’m now slowly parallel-izing this using reduce_sum. Starting simple by only moving the final sampling from the multivariate normal into the reduce_sum function. I’ll play around with the grainsize there, and slowly move more into the code that is send to the slices. So the first code for parallelized multilevel VAR(1) is:

// The reduce_sum function
functions {
  real part_sum(vector[] y_slice,
                int start, 
                int end,
                vector[] mu,
                matrix Omega,
                vector tau
                ) {
  return multi_normal_lpdf(y_slice | mu[start:end], quad_form_diag(Omega, tau));
  }
}
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // The data with I*T (=N) observations on K variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. I want to
  // remove the first observations sinc ethose don't have alagged predictor value (nothing came before)
  int grainsize;
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  corr_matrix[K] Omega; // Correlation matrix for correlations between residuals
  vector[K*K] beta_hat_location; // a K*K matrix of grand-mean (across-individual) AR-parameters (on the diagonal)  
  // and cross-lagged parameters.
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's for AR and Cross-lagged effects
  vector[K] alpha_hat_location; // grand-mean (across-individuals) on the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd of the means 
  corr_matrix[K] Omega_alpha; // correlations between means
  corr_matrix[K*K] Omega_beta;// correlations between AR- and cross-lagged effects
  
   
  // individual-level parameters. 
  matrix[I,K*K] z_beta;// use for the non-centered parametrization
  matrix[I,K] z_alpha; // use for the non-centered parametrization
}

transformed parameters {
  matrix[K, K] beta[I]; // Individual VAR(1) coefficient matrices (with individual AR- and cross-lagged effects)
  vector[K] alpha[I]; // Individual means on the K DV's
  matrix[K, K] L_alpha; // cholesky decomposition of correlation matrix of the means
  matrix[(K*K), (K*K)] L_beta; // cholesky decomposition of correlation matrix of the AR/Cross-lagged effects
  L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale));
  L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
  
  for(i in 1:I) {
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';// non=centered parametrization
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K); // non=centered parametrization
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  Omega_alpha ~ lkj_corr(3);
  Omega_beta ~ lkj_corr(5); 
  
  // hierarchical priors
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  Omega ~ lkj_corr(3);
  
  
  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // (vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  target += reduce_sum(part_sum,Y_Vec,grainsize,Mean_Vec,Omega,tau); // Feed DV's, Mean, Correlation matrix and
   
  
  }
 
}

Another thing that will help speed up the model is parameterizing in the form of multi_normal_cholesky.

I posted a hierarchical VAR in that form where I ask about how best to weight the correlation matrices in that form at Weighted correlation matrices and cholesky_factor_corr . The original non-cholesky form was from james savage’s blog and you can check that out at https://khakieconomics.github.io/2016/11/27/Hierarchical-priors-for-panel-vector-autoregressions.html.

Hey thanks!!

I’ll incorporate that! :).

Best!
Joran

Good catch. The cholesky version uses analytical gradients whereas the one currently used uses autodiff. So use the cholesky variant here.

Ok! Updated code below :)… I’ll run the code to test for speed-up tomorrow :). Thanks guys!

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  //cholesky_factor_corr[K] Omega;
  corr_matrix[K] Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  corr_matrix[K] Omega_alpha;// Correlations between means of predictors
  corr_matrix[K*K] Omega_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  matrix[K, K] L_Omega;
  matrix[K, K] L_alpha;
  matrix[(K*K), (K*K)] L_beta;
  L_Omega = cholesky_decompose(quad_form_diag(Omega, tau));
  L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale));
  L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
  
  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  Omega_alpha ~ lkj_corr(3);
  Omega_beta ~ lkj_corr(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  Omega ~ lkj_corr(3);
  //Omega ~ lkj_corr_cholesky(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Y_Vec ~ multi_normal_cholesky(Mean_Vec, L_Omega);
  
  }
 
}

Just ran the code and it shaved an additional 8 minutes of the running time (without reduce_sum)! :).
Will run it with reduce_sum now :).

Thanks again :)

Take a look at the multivariate priors for hierarchical models in the manual too.

You can get more speedup not using the cholesky decompose by doing this directly

parameters {
cholesky_factor_corr[K] L_Omega;
cholesky_factor_corr[K] L_alpha;// Correlations between means of predictors
cholesky_factor_corr[K*K] L_beta;// Correlations between AR and Cross-lagged products
...
}
transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
// matrix[K, K] L_alpha;
// matrix[(K*K), (K*K)] L_beta;
// no longer needed
// matrix[K, K] L_Omega;
// L_Omega = cholesky_decompose(quad_form_diag(Omega, tau));
// L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale)); 
// L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
}
model {
L_alpha ~ lkj_corr_cholesky(3);
L_beta ~ lkj_corr_cholesky(3);
L_Omega ~ lkj_corr_cholesky(3);
...
}

Thanks! :). Reading it now :).

Here is the updated code :). I’ll report back on the running time :).

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  //cholesky_factor_corr[K] Omega;
  cholesky_factor_corr[K] L_Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  cholesky_factor_corr[K] L_alpha;// Correlations between means of predictors
  cholesky_factor_corr[K*K] L_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix

  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  L_alpha ~ lkj_corr_cholesky(3);
  L_beta ~ lkj_corr_cholesky(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  L_Omega ~ lkj_corr_cholesky(3);
  //Omega ~ lkj_corr_cholesky(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  matrix[K,K] Chol_Y;
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Chol_Y = diag_pre_multiply(tau, L_Omega);
  
  Y_Vec ~ multi_normal_cholesky(Mean_Vec, Chol_Y);
  
  }
 
}

Best,
Joran

Oddly enough, it seems slower when directly drawing from lkj_corr_cholesky. It takes 12 minutes more now.

May be I just got weird strting values, I’ll try again :)

Hi @spinkney!

I ran the model twice, but it appears to be slower than when using lkj_corr priors and the decomposition.

So, this model is faster

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  //cholesky_factor_corr[K] Omega;
  corr_matrix[K] Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  corr_matrix[K] Omega_alpha;// Correlations between means of predictors
  corr_matrix[K*K] Omega_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  matrix[K, K] L_Omega;
  matrix[K, K] L_alpha;
  matrix[(K*K), (K*K)] L_beta;
  L_Omega = cholesky_decompose(quad_form_diag(Omega, tau));
  L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale));
  L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
  
  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  Omega_alpha ~ lkj_corr(3);
  Omega_beta ~ lkj_corr(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  Omega ~ lkj_corr(3);
  //Omega ~ lkj_corr_cholesky(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Y_Vec ~ multi_normal_cholesky(Mean_Vec, L_Omega);
  
  }
 
}

than this model.

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  //cholesky_factor_corr[K] Omega;
  cholesky_factor_corr[K] L_Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  cholesky_factor_corr[K] L_alpha;// Correlations between means of predictors
  cholesky_factor_corr[K*K] L_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix

  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  L_alpha ~ lkj_corr_cholesky(3);
  L_beta ~ lkj_corr_cholesky(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  L_Omega ~ lkj_corr_cholesky(3);
  //Omega ~ lkj_corr_cholesky(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  matrix[K,K] Chol_Y;
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Chol_Y = diag_pre_multiply(tau, L_Omega);
  
  Y_Vec ~ multi_normal_cholesky(Mean_Vec, Chol_Y);
  
  }
 
}

The first model takes about 41 minutes and the second one 53 with 14000 lines (70 measures on 200 individuals).

Did I make a mistake? Or misunderstand your suggestion?

Best,
Joran

Wait! I think I found a stupid mistake! ;).

It should be:

 for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + (diag_pre_multiply(alpha_hat_scale, L_alpha)) * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + (diag_pre_multiply(beta_hat_scale, L_beta)) * z_beta[i]'), K, K);
    }

Yeah! That fixed it! :). It’s 40 minutes with the lkj_corr_cholesky priors :).
Nothing like appropriate priors to speed things up :)

1 Like

Hi Joran,
I am dealing with a model that included AR and lagged variables. I have 22 ordinal predictor variables, ordinal response for 820 individuals and 4 time-points.


I posted on Stan today to ask for help that I came across your model.
This is the picture of the model. The data is in wide format.
as a data frame. Can I get your advice on the analyses I am doing? I appreciate your guidance.

Hi @ssalimi,

How can I help :). If you want to to use my code as a basis, my data is in long-format instead of wide. Having it as a data.frame in R should be fine :).

Best,
Joran

Ok, so it appears reduce_sum doesn’t really speed up my model on windows 10.

I have this multilevel VAR(1) model that I want to speed up:

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  //cholesky_factor_corr[K] Omega;
  cholesky_factor_corr[K] L_Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  cholesky_factor_corr[K] L_alpha;// Correlations between means of predictors
  cholesky_factor_corr[K*K] L_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  
 
  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + (diag_pre_multiply(alpha_hat_scale, L_alpha)) * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + (diag_pre_multiply(beta_hat_scale, L_beta)) * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  L_alpha ~ lkj_corr_cholesky(3);
  L_beta ~ lkj_corr_cholesky(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  L_Omega ~ lkj_corr_cholesky(3);
  //Omega ~ lkj_corr_cholesky(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  matrix[K,K] Chol_Y;
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Chol_Y = diag_pre_multiply(tau, L_Omega);
  
  Y_Vec ~ multi_normal_cholesky(Mean_Vec, Chol_Y);
  
  }
 
}

I tried only moving the final sampling from the multivariate normal to the reduce_sum function:

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
functions {
  real part_sum(vector[] y_slice,
                int start, 
                int end,
                vector[] mu,
                matrix Chol_Y
                ) {
  return multi_normal_cholesky_lpdf(y_slice | mu[start:end], Chol_Y);
  }
}
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
  int grainsize;
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  //cholesky_factor_corr[K] Omega;
  cholesky_factor_corr[K] L_Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  cholesky_factor_corr[K] L_alpha;// Correlations between means of predictors
  cholesky_factor_corr[K*K] L_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  
 
  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + (diag_pre_multiply(alpha_hat_scale, L_alpha)) * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + (diag_pre_multiply(beta_hat_scale, L_beta)) * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  L_alpha ~ lkj_corr_cholesky(3);
  L_beta ~ lkj_corr_cholesky(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  L_Omega ~ lkj_corr_cholesky(3);
  //Omega ~ lkj_corr_cholesky(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  matrix[K,K] Chol_Y;
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Chol_Y = diag_pre_multiply(tau, L_Omega);
  
  target += reduce_sum(part_sum,Y_Vec,grainsize,Mean_Vec,Chol_Y);
  
  }
 
}

And I tried moving the data modifications that I do in the model section as well:

// The reduce_sum function
functions {
  real part_sum(vector[] y_slice,
                int start, 
                int end,
                matrix Chol_Y,
                //matrix L_Omega, //matrix Chol_Y,
                //vector tau,
                vector[] X, // add lagged predictors instead of making them in function. Saves an if-statement
                vector[] alpha,
                matrix[] beta,
                int[] individual,
                int[] time,
                int K
                ) {
                  
  // ad mu creation to function
  // first: indexes
  int length = end-start + 1;
  int n0 = start- 1;
  
  vector[K] Xc[length];
  vector[K] Mean[length];
  vector[K] alpha_vec[length];
  //matrix[N,K] alpha_vec;
  vector[length] ar1_vec;
  vector[length] ar2_vec;
  vector[length] cl1_vec;
  vector[length] cl2_vec;
  vector[K] cl_vec[length];
  //matrix[K,K] Chol_Y;
  
  //Chol_Y = diag_pre_multiply(tau, L_Omega);
 

  alpha_vec = alpha[individual[start:end]]; 
  ar1_vec = to_vector(beta[individual[start:end],1,1]);
  ar2_vec = to_vector(beta[individual[start:end],K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual[start:end],1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual[start:end],K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[individual[start:end],1])- to_vector(alpha_vec[1:length,1]));
  Xc[,2] = to_array_1d(to_vector(X[individual[start:end],2])- to_vector(alpha_vec[1:length,2]));

  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:length,1]) + to_vector(ar1_vec .* to_vector(Xc[1:length,1])) + to_vector(cl1_vec .* to_vector(Xc[1:length,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:length,2]) + to_vector(ar2_vec .* to_vector(Xc[1:length,2])) + to_vector(cl2_vec .* to_vector(Xc[1:length,1])));
 

  
  return  multi_normal_cholesky_lpdf(y_slice | Mean, Chol_Y);
  }
}
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // The data with I*T (=N) observations on K variables
  vector[K] X[N]; // lagged predictors
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. I want to
  // remove the first observations sinc ethose don't have alagged predictor value (nothing came before)
  int grainsize;
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  cholesky_factor_corr[K] L_Omega; // Correlation matrix for correlations between residuals
  vector[K*K] beta_hat_location; // a K*K matrix of grand-mean (across-individual) AR-parameters (on the diagonal)  
  // and cross-lagged parameters.
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's for AR and Cross-lagged effects
  vector[K] alpha_hat_location; // grand-mean (across-individuals) on the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd of the means 
  cholesky_factor_corr[K] L_alpha; // correlations between means
  cholesky_factor_corr[K*K] L_beta;// correlations between AR- and cross-lagged effects
  
   
  // individual-level parameters. 
  matrix[I,K*K] z_beta;// use for the non-centered parametrization
  matrix[I,K] z_alpha; // use for the non-centered parametrization
}

transformed parameters {
  matrix[K, K] beta[I]; // Individual VAR(1) coefficient matrices (with individual AR- and cross-lagged effects)
  vector[K] alpha[I]; // Individual means on the K DV's

  for(i in 1:I) {
    alpha[i] = alpha_hat_location + (diag_pre_multiply(alpha_hat_scale, L_alpha)) * z_alpha[i]';// non=centered parametrization
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + (diag_pre_multiply(beta_hat_scale, L_beta)) * z_beta[i]'), K, K); // non=centered parametrization
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  L_alpha ~ lkj_corr_cholesky(3);
  L_beta ~ lkj_corr_cholesky(5); 
 
  
  // hierarchical priors
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  L_Omega ~ lkj_corr_cholesky(3);
  
  {
  matrix[K,K] Chol_Y;
  Chol_Y = diag_pre_multiply(tau, L_Omega);
  
  //target += reduce_sum(part_sum,Y[Use_obs],grainsize,L_Omega,tau, X[Use_obs],alpha,beta,individual,time,K); // Feed DV's, Mean, Correlation matrix and
  
  target += reduce_sum(part_sum,Y[Use_obs],grainsize,Chol_Y, X[Use_obs],alpha,beta,individual,time,K); // Feed DV's, Mean, Correlation matrix and
  }
  
 
}

Neither model is faster than the model without reduce_sum and only get slower if I make the grainsize smaller. Does anyone know why this might be the case? :).

I’m trying to get it working with map_rect now because I can use rstan than, and figure out whether it has to do with cmdstan, But there is an error in the way I call map_rect I believe, but I can’t find it. Could someone point me to the error?

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
functions {
  vector lp_reduce( vector beta , vector theta , real[] xr , int[] xi ) {
    int n = 70; // size(xr); // Number of cases per shard
    int K = 2;
    vector[K] Y[n];
    vector[K] Xc[n];
    vector[K] Mean[n];
    real lp;
    
    real y[n] = xr[1:n]; // Scores on DV 1
    real y2[n] = xr[(n+1):(2*n)]; // Scores on DV 2
    real X[n] = xr[((2*n)+1):(3*n)]; // Centered Scores on DV 1, used as predictor
    real X2[n] = xr[((3*n)+1):(4*n)];// Centered Scores on DV 2, used as predictor
    
    vector[n] alpha = theta[1:n]; // individual means on DV 1
    vector[n] alpha2 = theta[(n+1):(2*n)]; // individual means on DV 2
    vector[n] beta1 = theta[((2*n)+1):(3*n)]; // AR  parameter for DV 1
    vector[n] beta2 = theta[((3*n)+1):(4*n)]; // Cross-lagged effects from DV 1 to Dv 2
    vector[n] beta3 = theta[((4*n)+1):(5*n)]; // Cross-lagged effects from DV 2 to Dv 1
    vector[n] beta4 = theta[((5*n)+1):(6*n)]; // AR parameter for DV 2
    
    matrix[2,2] L_Omega = to_matrix(beta[1:4],2,2); 
    
    Y[n,1] = y[n];
    Y[n,2] = y2[n];
    
    Xc[1:n,1] = to_array_1d(to_vector(X[1:n]) - to_vector(alpha[1:n]));
    Xc[1:n,2] = to_array_1d(to_vector(X2[1:n]) - to_vector(alpha2[1:n]));
    
    Mean[,1] = to_array_1d(to_vector(alpha[1:n]) + to_vector(beta1 .* to_vector(Xc[1:n,1])) + to_vector(beta3 .* to_vector(Xc[1:n,2])));
    Mean[,2] = to_array_1d(to_vector(alpha2[1:n]) + to_vector(beta4 .* to_vector(Xc[1:n,2])) + to_vector(beta2 .* to_vector(Xc[1:n,1])));
 
    lp = multi_normal_cholesky_lpdf(Y|Mean, L_Omega);
    return [lp]';
  }
} 
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
  //int grainsize;
}
transformed data {
  // 200 shards. Run per individual and combine
  // M = N/200 = 14000/200 = 70
  int n_shards = 200;
  int M = 70;// N/n_shards;
  int xi[n_shards,M];  
  real xr[n_shards, 4*M];// 4M because two variables, and 2 lagged predicted means get stacked in array
  // split into shards
  for ( i in 1:n_shards ) {
    int j = 1 + (i-1)*M;
    int k = i*M;
    xi[i,1:M] = individual[ j:k ];
    
    xr[i,1:M] = Y[j:k, 1 ];
    xr[i,(M+1):(2*M)] = Y[j:k, 2 ];
    xr[i,((2*M)+1):(3*M)] = X[j:k, 1 ];
    xr[i,((3*M)+1):(4*M)] = X[j:k, 2 ];
    
  }
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  cholesky_factor_corr[K] L_Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  cholesky_factor_corr[K] L_alpha;// Correlations between means of predictors
  cholesky_factor_corr[K*K] L_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] betas[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  vector[6*M] theta[n_shards]; // 8M because two means, two AR-prs, 2 cl-pars, Omega and tau, 6 if Omega and tau fixed
 
  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + (diag_pre_multiply(alpha_hat_scale, L_alpha)) * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    betas[i] = to_matrix((beta_hat_location + (diag_pre_multiply(beta_hat_scale, L_beta)) * z_beta[i]'), K, K);
    }
  
   for ( i in 1:n_shards ) {
    int j = 1 + (i-1)*M;
    int k = i*M;
    
    theta[i,1:M] = to_vector(alpha[individual[j:k],1]);
    theta[i,(M+1):(2*M)] = to_vector(alpha[individual[j:k],2]);
    theta[i,((2*M)+1):(3*M)] = to_vector(betas[individual[j:k],1,1]);
    theta[i,((3*M)+1):(4*M)] = to_vector(betas[individual[j:k],K,K]);
    theta[i,((4*M)+1):(5*M)] = to_vector(betas[individual[j:k],1,K]);
    theta[i,((5*M)+1):(6*M)] = to_vector(betas[individual[j:k],K,1]);
  }
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  L_alpha ~ lkj_corr_cholesky(3);
  L_beta ~ lkj_corr_cholesky(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  L_Omega ~ lkj_corr_cholesky(3);
  

  {
  
  vector[K*K] beta;
  
  
  beta = to_vector(diag_pre_multiply(tau, L_Omega));
  
   target += sum(map_rect(lp_reduce, beta, theta, xr, xi));
  
  }
 
}

I get the error: Exception: Exception: multi_normal_cholesky_lpdf: Random variable[1] is nan, but must not be nan!

Thanks again guys!! :)

Best,
Joran

I get the error: Exception: Exception: multi_normal_cholesky_lpdf: Random variable[1] is nan, but must not be nan!

If you look at the construction of Y in lp_reduce:

    Y[n,1] = y[n];
    Y[n,2] = y2[n];

Because n is the integer 70, you’re just copying the 70th value of y and y2 into the 70th array of Y. This means that all other values of Y are left uninitialised as nan, which is why you get the error