Sum_reduce with multilevel VAR-model

Take a look at the multivariate priors for hierarchical models in the manual too.

You can get more speedup not using the cholesky decompose by doing this directly

parameters {
cholesky_factor_corr[K] L_Omega;
cholesky_factor_corr[K] L_alpha;// Correlations between means of predictors
cholesky_factor_corr[K*K] L_beta;// Correlations between AR and Cross-lagged products
...
}
transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
// matrix[K, K] L_alpha;
// matrix[(K*K), (K*K)] L_beta;
// no longer needed
// matrix[K, K] L_Omega;
// L_Omega = cholesky_decompose(quad_form_diag(Omega, tau));
// L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale)); 
// L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
}
model {
L_alpha ~ lkj_corr_cholesky(3);
L_beta ~ lkj_corr_cholesky(3);
L_Omega ~ lkj_corr_cholesky(3);
...
}

Thanks! :). Reading it now :).

Here is the updated code :). I’ll report back on the running time :).

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  //cholesky_factor_corr[K] Omega;
  cholesky_factor_corr[K] L_Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  cholesky_factor_corr[K] L_alpha;// Correlations between means of predictors
  cholesky_factor_corr[K*K] L_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix

  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  L_alpha ~ lkj_corr_cholesky(3);
  L_beta ~ lkj_corr_cholesky(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  L_Omega ~ lkj_corr_cholesky(3);
  //Omega ~ lkj_corr_cholesky(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  matrix[K,K] Chol_Y;
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Chol_Y = diag_pre_multiply(tau, L_Omega);
  
  Y_Vec ~ multi_normal_cholesky(Mean_Vec, Chol_Y);
  
  }
 
}

Best,
Joran

Oddly enough, it seems slower when directly drawing from lkj_corr_cholesky. It takes 12 minutes more now.

May be I just got weird strting values, I’ll try again :)

Hi @spinkney!

I ran the model twice, but it appears to be slower than when using lkj_corr priors and the decomposition.

So, this model is faster

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  //cholesky_factor_corr[K] Omega;
  corr_matrix[K] Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  corr_matrix[K] Omega_alpha;// Correlations between means of predictors
  corr_matrix[K*K] Omega_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  matrix[K, K] L_Omega;
  matrix[K, K] L_alpha;
  matrix[(K*K), (K*K)] L_beta;
  L_Omega = cholesky_decompose(quad_form_diag(Omega, tau));
  L_alpha = cholesky_decompose(quad_form_diag(Omega_alpha, alpha_hat_scale));
  L_beta = cholesky_decompose(quad_form_diag(Omega_beta, beta_hat_scale));
  
  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  Omega_alpha ~ lkj_corr(3);
  Omega_beta ~ lkj_corr(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  Omega ~ lkj_corr(3);
  //Omega ~ lkj_corr_cholesky(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Y_Vec ~ multi_normal_cholesky(Mean_Vec, L_Omega);
  
  }
 
}

than this model.

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  //cholesky_factor_corr[K] Omega;
  cholesky_factor_corr[K] L_Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  cholesky_factor_corr[K] L_alpha;// Correlations between means of predictors
  cholesky_factor_corr[K*K] L_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix

  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + L_alpha * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + L_beta * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  L_alpha ~ lkj_corr_cholesky(3);
  L_beta ~ lkj_corr_cholesky(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  L_Omega ~ lkj_corr_cholesky(3);
  //Omega ~ lkj_corr_cholesky(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  matrix[K,K] Chol_Y;
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Chol_Y = diag_pre_multiply(tau, L_Omega);
  
  Y_Vec ~ multi_normal_cholesky(Mean_Vec, Chol_Y);
  
  }
 
}

The first model takes about 41 minutes and the second one 53 with 14000 lines (70 measures on 200 individuals).

Did I make a mistake? Or misunderstand your suggestion?

Best,
Joran

Wait! I think I found a stupid mistake! ;).

It should be:

 for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + (diag_pre_multiply(alpha_hat_scale, L_alpha)) * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + (diag_pre_multiply(beta_hat_scale, L_beta)) * z_beta[i]'), K, K);
    }

Yeah! That fixed it! :). It’s 40 minutes with the lkj_corr_cholesky priors :).
Nothing like appropriate priors to speed things up :)

1 Like

Hi Joran,
I am dealing with a model that included AR and lagged variables. I have 22 ordinal predictor variables, ordinal response for 820 individuals and 4 time-points.


I posted on Stan today to ask for help that I came across your model.
This is the picture of the model. The data is in wide format.
as a data frame. Can I get your advice on the analyses I am doing? I appreciate your guidance.

Hi @ssalimi,

How can I help :). If you want to to use my code as a basis, my data is in long-format instead of wide. Having it as a data.frame in R should be fine :).

Best,
Joran

Ok, so it appears reduce_sum doesn’t really speed up my model on windows 10.

I have this multilevel VAR(1) model that I want to speed up:

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  //cholesky_factor_corr[K] Omega;
  cholesky_factor_corr[K] L_Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  cholesky_factor_corr[K] L_alpha;// Correlations between means of predictors
  cholesky_factor_corr[K*K] L_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  
 
  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + (diag_pre_multiply(alpha_hat_scale, L_alpha)) * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + (diag_pre_multiply(beta_hat_scale, L_beta)) * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  L_alpha ~ lkj_corr_cholesky(3);
  L_beta ~ lkj_corr_cholesky(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  L_Omega ~ lkj_corr_cholesky(3);
  //Omega ~ lkj_corr_cholesky(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  matrix[K,K] Chol_Y;
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Chol_Y = diag_pre_multiply(tau, L_Omega);
  
  Y_Vec ~ multi_normal_cholesky(Mean_Vec, Chol_Y);
  
  }
 
}

I tried only moving the final sampling from the multivariate normal to the reduce_sum function:

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
functions {
  real part_sum(vector[] y_slice,
                int start, 
                int end,
                vector[] mu,
                matrix Chol_Y
                ) {
  return multi_normal_cholesky_lpdf(y_slice | mu[start:end], Chol_Y);
  }
}
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
  int grainsize;
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  //cholesky_factor_corr[K] Omega;
  cholesky_factor_corr[K] L_Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  cholesky_factor_corr[K] L_alpha;// Correlations between means of predictors
  cholesky_factor_corr[K*K] L_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] beta[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  
 
  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + (diag_pre_multiply(alpha_hat_scale, L_alpha)) * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + (diag_pre_multiply(beta_hat_scale, L_beta)) * z_beta[i]'), K, K);
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  L_alpha ~ lkj_corr_cholesky(3);
  L_beta ~ lkj_corr_cholesky(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  L_Omega ~ lkj_corr_cholesky(3);
  //Omega ~ lkj_corr_cholesky(3);
  
  

  {
  vector[K] Xc[N];
  vector[K] Mean[N];
  vector[K] alpha_vec[N];
  vector[N] ar1_vec;
  vector[N] ar2_vec;
  vector[N] cl1_vec;
  vector[N] cl2_vec;
  //vector[K] cl_vec[N];
  
  vector[K] Mean_Vec[N-I];
  vector[K] Y_Vec[N-I];
  
  matrix[K,K] Chol_Y;
  
  
  alpha_vec = alpha[individual]; // store individual means in a vector, with each pair of means appearing multiple times. Once for each row that
  // belong to an individual. Arguments are: individual means (vector[K] alpha[I]), indictor array listing which individual belong to each row 
  // (int individual[N]). Results in vector[K] alpha_vec[N] array of all mean pairs for the rows. Here  K = number of variables, I = number of 
  // distinct individuals, and N = number of rows in data (#people * #observations pp (T)). 
  ar1_vec = to_vector(beta[individual,1,1]);// separately store all the individual AR-parameters for variable 1 belonging to the rows. 
  // Since mulitple rows belong to the same person, values are repeated. Imput is the individual matrices holding the VAR-parameters (AR and 
  // cross-lagged) for each individual (matrix[K, K] beta[I];), indictor array listing which individual belong to each row (int individual[N]).
  // Results in vector[K] ar1_vec[N] array of all AR-parameters for variable 1, for each of the rows.
  ar2_vec = to_vector(beta[individual,K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual,1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual,K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[1:N,1]) - to_vector(alpha_vec[1:N,1]));
  Xc[,2] = to_array_1d(to_vector(X[1:N,2]) - to_vector(alpha_vec[1:N,2]));

  
  // Here I want to determine the predicted mean for each row (I'm doing so per column). This is done by taking the individual means for that row,
  // and adding 1) variable 1 multiplied with its AR-parameter, and 2) variable 2 multiplied with the cross-lagged effect of var 2 on var1.
  // so I want vector[K] Mean[,1] = vector[K] alpha_vec[,1] + (vector[K] ar1_vec multiplied elementwise with vector[K] Xc[,1]) + 
  // vector[K] cl1_vec multiplied elementwise with vector[K] Xc[,2]). This is then adding a bunch of similar sized vectors to get a vector that
  // is the sum of those vectors.
  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:N,1]) + to_vector(ar1_vec .* to_vector(Xc[1:N,1])) + to_vector(cl1_vec .* to_vector(Xc[1:N,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:N,2]) + to_vector(ar2_vec .* to_vector(Xc[1:N,2])) + to_vector(cl2_vec .* to_vector(Xc[1:N,1])));
 

  // Don't use first observations for a participant, due to lag, no predictors there
  Y_Vec = Y[Use_obs];
  Mean_Vec = Mean[Use_obs];
  
  Chol_Y = diag_pre_multiply(tau, L_Omega);
  
  target += reduce_sum(part_sum,Y_Vec,grainsize,Mean_Vec,Chol_Y);
  
  }
 
}

And I tried moving the data modifications that I do in the model section as well:

// The reduce_sum function
functions {
  real part_sum(vector[] y_slice,
                int start, 
                int end,
                matrix Chol_Y,
                //matrix L_Omega, //matrix Chol_Y,
                //vector tau,
                vector[] X, // add lagged predictors instead of making them in function. Saves an if-statement
                vector[] alpha,
                matrix[] beta,
                int[] individual,
                int[] time,
                int K
                ) {
                  
  // ad mu creation to function
  // first: indexes
  int length = end-start + 1;
  int n0 = start- 1;
  
  vector[K] Xc[length];
  vector[K] Mean[length];
  vector[K] alpha_vec[length];
  //matrix[N,K] alpha_vec;
  vector[length] ar1_vec;
  vector[length] ar2_vec;
  vector[length] cl1_vec;
  vector[length] cl2_vec;
  vector[K] cl_vec[length];
  //matrix[K,K] Chol_Y;
  
  //Chol_Y = diag_pre_multiply(tau, L_Omega);
 

  alpha_vec = alpha[individual[start:end]]; 
  ar1_vec = to_vector(beta[individual[start:end],1,1]);
  ar2_vec = to_vector(beta[individual[start:end],K,K]); // same as above vut for AR-parameter of variable 2
  cl1_vec = to_vector(beta[individual[start:end],1,K]);// same as above vut for cross-lagged effect of variable 2 on variable 1
  cl2_vec = to_vector(beta[individual[start:end],K,1]);// same as above vut for cross-lagged effect of variable 1 on variable 2
  
  
  Xc[,1] = to_array_1d(to_vector(X[individual[start:end],1])- to_vector(alpha_vec[1:length,1]));
  Xc[,2] = to_array_1d(to_vector(X[individual[start:end],2])- to_vector(alpha_vec[1:length,2]));

  Mean[,1] = to_array_1d(to_vector(alpha_vec[1:length,1]) + to_vector(ar1_vec .* to_vector(Xc[1:length,1])) + to_vector(cl1_vec .* to_vector(Xc[1:length,2])));
  Mean[,2] = to_array_1d(to_vector(alpha_vec[1:length,2]) + to_vector(ar2_vec .* to_vector(Xc[1:length,2])) + to_vector(cl2_vec .* to_vector(Xc[1:length,1])));
 

  
  return  multi_normal_cholesky_lpdf(y_slice | Mean, Chol_Y);
  }
}
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // The data with I*T (=N) observations on K variables
  vector[K] X[N]; // lagged predictors
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. I want to
  // remove the first observations sinc ethose don't have alagged predictor value (nothing came before)
  int grainsize;
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  cholesky_factor_corr[K] L_Omega; // Correlation matrix for correlations between residuals
  vector[K*K] beta_hat_location; // a K*K matrix of grand-mean (across-individual) AR-parameters (on the diagonal)  
  // and cross-lagged parameters.
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's for AR and Cross-lagged effects
  vector[K] alpha_hat_location; // grand-mean (across-individuals) on the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd of the means 
  cholesky_factor_corr[K] L_alpha; // correlations between means
  cholesky_factor_corr[K*K] L_beta;// correlations between AR- and cross-lagged effects
  
   
  // individual-level parameters. 
  matrix[I,K*K] z_beta;// use for the non-centered parametrization
  matrix[I,K] z_alpha; // use for the non-centered parametrization
}

transformed parameters {
  matrix[K, K] beta[I]; // Individual VAR(1) coefficient matrices (with individual AR- and cross-lagged effects)
  vector[K] alpha[I]; // Individual means on the K DV's

  for(i in 1:I) {
    alpha[i] = alpha_hat_location + (diag_pre_multiply(alpha_hat_scale, L_alpha)) * z_alpha[i]';// non=centered parametrization
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    beta[i] = to_matrix((beta_hat_location + (diag_pre_multiply(beta_hat_scale, L_beta)) * z_beta[i]'), K, K); // non=centered parametrization
    }
  
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  L_alpha ~ lkj_corr_cholesky(3);
  L_beta ~ lkj_corr_cholesky(5); 
 
  
  // hierarchical priors
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  L_Omega ~ lkj_corr_cholesky(3);
  
  {
  matrix[K,K] Chol_Y;
  Chol_Y = diag_pre_multiply(tau, L_Omega);
  
  //target += reduce_sum(part_sum,Y[Use_obs],grainsize,L_Omega,tau, X[Use_obs],alpha,beta,individual,time,K); // Feed DV's, Mean, Correlation matrix and
  
  target += reduce_sum(part_sum,Y[Use_obs],grainsize,Chol_Y, X[Use_obs],alpha,beta,individual,time,K); // Feed DV's, Mean, Correlation matrix and
  }
  
 
}

Neither model is faster than the model without reduce_sum and only get slower if I make the grainsize smaller. Does anyone know why this might be the case? :).

I’m trying to get it working with map_rect now because I can use rstan than, and figure out whether it has to do with cmdstan, But there is an error in the way I call map_rect I believe, but I can’t find it. Could someone point me to the error?

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
functions {
  vector lp_reduce( vector beta , vector theta , real[] xr , int[] xi ) {
    int n = 70; // size(xr); // Number of cases per shard
    int K = 2;
    vector[K] Y[n];
    vector[K] Xc[n];
    vector[K] Mean[n];
    real lp;
    
    real y[n] = xr[1:n]; // Scores on DV 1
    real y2[n] = xr[(n+1):(2*n)]; // Scores on DV 2
    real X[n] = xr[((2*n)+1):(3*n)]; // Centered Scores on DV 1, used as predictor
    real X2[n] = xr[((3*n)+1):(4*n)];// Centered Scores on DV 2, used as predictor
    
    vector[n] alpha = theta[1:n]; // individual means on DV 1
    vector[n] alpha2 = theta[(n+1):(2*n)]; // individual means on DV 2
    vector[n] beta1 = theta[((2*n)+1):(3*n)]; // AR  parameter for DV 1
    vector[n] beta2 = theta[((3*n)+1):(4*n)]; // Cross-lagged effects from DV 1 to Dv 2
    vector[n] beta3 = theta[((4*n)+1):(5*n)]; // Cross-lagged effects from DV 2 to Dv 1
    vector[n] beta4 = theta[((5*n)+1):(6*n)]; // AR parameter for DV 2
    
    matrix[2,2] L_Omega = to_matrix(beta[1:4],2,2); 
    
    Y[n,1] = y[n];
    Y[n,2] = y2[n];
    
    Xc[1:n,1] = to_array_1d(to_vector(X[1:n]) - to_vector(alpha[1:n]));
    Xc[1:n,2] = to_array_1d(to_vector(X2[1:n]) - to_vector(alpha2[1:n]));
    
    Mean[,1] = to_array_1d(to_vector(alpha[1:n]) + to_vector(beta1 .* to_vector(Xc[1:n,1])) + to_vector(beta3 .* to_vector(Xc[1:n,2])));
    Mean[,2] = to_array_1d(to_vector(alpha2[1:n]) + to_vector(beta4 .* to_vector(Xc[1:n,2])) + to_vector(beta2 .* to_vector(Xc[1:n,1])));
 
    lp = multi_normal_cholesky_lpdf(Y|Mean, L_Omega);
    return [lp]';
  }
} 
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
  //int grainsize;
}
transformed data {
  // 200 shards. Run per individual and combine
  // M = N/200 = 14000/200 = 70
  int n_shards = 200;
  int M = 70;// N/n_shards;
  int xi[n_shards,M];  
  real xr[n_shards, 4*M];// 4M because two variables, and 2 lagged predicted means get stacked in array
  // split into shards
  for ( i in 1:n_shards ) {
    int j = 1 + (i-1)*M;
    int k = i*M;
    xi[i,1:M] = individual[ j:k ];
    
    xr[i,1:M] = Y[j:k, 1 ];
    xr[i,(M+1):(2*M)] = Y[j:k, 2 ];
    xr[i,((2*M)+1):(3*M)] = X[j:k, 1 ];
    xr[i,((3*M)+1):(4*M)] = X[j:k, 2 ];
    
  }
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  cholesky_factor_corr[K] L_Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  cholesky_factor_corr[K] L_alpha;// Correlations between means of predictors
  cholesky_factor_corr[K*K] L_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] betas[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  vector[6*M] theta[n_shards]; // 8M because two means, two AR-prs, 2 cl-pars, Omega and tau, 6 if Omega and tau fixed
 
  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + (diag_pre_multiply(alpha_hat_scale, L_alpha)) * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    betas[i] = to_matrix((beta_hat_location + (diag_pre_multiply(beta_hat_scale, L_beta)) * z_beta[i]'), K, K);
    }
  
   for ( i in 1:n_shards ) {
    int j = 1 + (i-1)*M;
    int k = i*M;
    
    theta[i,1:M] = to_vector(alpha[individual[j:k],1]);
    theta[i,(M+1):(2*M)] = to_vector(alpha[individual[j:k],2]);
    theta[i,((2*M)+1):(3*M)] = to_vector(betas[individual[j:k],1,1]);
    theta[i,((3*M)+1):(4*M)] = to_vector(betas[individual[j:k],K,K]);
    theta[i,((4*M)+1):(5*M)] = to_vector(betas[individual[j:k],1,K]);
    theta[i,((5*M)+1):(6*M)] = to_vector(betas[individual[j:k],K,1]);
  }
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  L_alpha ~ lkj_corr_cholesky(3);
  L_beta ~ lkj_corr_cholesky(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  L_Omega ~ lkj_corr_cholesky(3);
  

  {
  
  vector[K*K] beta;
  
  
  beta = to_vector(diag_pre_multiply(tau, L_Omega));
  
   target += sum(map_rect(lp_reduce, beta, theta, xr, xi));
  
  }
 
}

I get the error: Exception: Exception: multi_normal_cholesky_lpdf: Random variable[1] is nan, but must not be nan!

Thanks again guys!! :)

Best,
Joran

I get the error: Exception: Exception: multi_normal_cholesky_lpdf: Random variable[1] is nan, but must not be nan!

If you look at the construction of Y in lp_reduce:

    Y[n,1] = y[n];
    Y[n,2] = y2[n];

Because n is the integer 70, you’re just copying the 70th value of y and y2 into the 70th array of Y. This means that all other values of Y are left uninitialised as nan, which is why you get the error

Hahahaha! I was so convinced I typed 1:n there, that I completely read over it!

Thanks!!! :)

Haha no worries, it’s always the indexing that gets me as well

Argh! You know it ;)

Ok! Thnx to @andrjohns the code runs :). final version is below:

// Learn more about model development with Stan at:
//
//    http://mc-stan.org/users/interfaces/rstan.html
//    https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
functions {
  vector lp_reduce( vector beta , vector theta , real[] xr , int[] xi ) {
    int n = 70; // size(xr); // Number of cases per shard
    int K = 2;
    vector[K] Y[n];
    vector[K] Xc[n];
    vector[K] Mean[n];
    real lp;
    
    real y[n] = xr[1:n]; // Scores on DV 1
    real y2[n] = xr[(n+1):(2*n)]; // Scores on DV 2
    real X[n] = xr[((2*n)+1):(3*n)]; // Centered Scores on DV 1, used as predictor
    real X2[n] = xr[((3*n)+1):(4*n)];// Centered Scores on DV 2, used as predictor
    
    vector[n] alpha = theta[1:n]; // individual means on DV 1
    vector[n] alpha2 = theta[(n+1):(2*n)]; // individual means on DV 2
    vector[n] beta1 = theta[((2*n)+1):(3*n)]; // AR  parameter for DV 1
    vector[n] beta2 = theta[((3*n)+1):(4*n)]; // AR parameter for DV 2
    vector[n] beta3 = theta[((4*n)+1):(5*n)]; // Cross-lagged effects from DV 2 to Dv 1
    vector[n] beta4 = theta[((5*n)+1):(6*n)]; // Cross-lagged effects from DV 1 to Dv 2
    
    matrix[2,2] L_Omega = to_matrix(beta[1:4],2,2); 
    
    Y[1:n,1] = y[1:n];
    Y[1:n,2] = y2[1:n];
    
    Xc[1:n,1] = to_array_1d(to_vector(X[1:n]) - alpha[1:n]);
    Xc[1:n,2] = to_array_1d(to_vector(X2[1:n]) - alpha2[1:n]);
    
    Mean[,1] = to_array_1d(alpha[1:n] + to_vector(beta1 .* to_vector(Xc[1:n,1])) + to_vector(beta3 .* to_vector(Xc[1:n,2])));
    Mean[,2] = to_array_1d(alpha2[1:n] + to_vector(beta2 .* to_vector(Xc[1:n,2])) + to_vector(beta4 .* to_vector(Xc[1:n,1])));
 
    lp = multi_normal_cholesky_lpdf(Y|Mean, L_Omega);
    return [lp]';
  }
} 
data {
  int N; // number of observations (number of rows in the panel) this is equal to N*T due to long format
  int K; // number of dimensions of the outcome variable
  int I; // number of individuals
  int T; // The greatest number of time periods for any individual
  int<lower = 1, upper = I> individual[N]; // integer vector the same length 
  // as the panel, indicating which individual each row corresponds to
  int<lower = 1, upper = T> time[N]; // integer vector with individual-time 
  // (not absolute time). Each individual starts at 1
  vector[K] Y[N]; // Dependent variables
  vector[K] X[N]; // lagged predictor
  int Use_obs[N-I]; // indicator that indicates which rows are NOT individual's first observations. 
  //matrix[N, K] Y; // the outcome matrix, each individual stacked on each 
  // other, observations ordered earliest to latest
  //int grainsize;
}
transformed data {
  // 200 shards. Run per individual and combine
  // M = N/200 = 14000/200 = 70
  int n_shards = 200;
  int M = 70;// N/n_shards;
  int xi[n_shards,M];  
  real xr[n_shards, 4*M];// 4M because two variables, and 2 lagged predicted means get stacked in array
  // split into shards
  for ( i in 1:n_shards ) {
    int j = 1 + (i-1)*M;
    int k = i*M;
    xi[i,1:M] = individual[ j:k ];
    
    xr[i,1:M] = Y[j:k, 1 ];
    xr[i,(M+1):(2*M)] = Y[j:k, 2 ];
    xr[i,((2*M)+1):(3*M)] = X[j:k, 1 ];
    xr[i,((3*M)+1):(4*M)] = X[j:k, 2 ];
    
  }
}

// The parameters accepted by the model. 
parameters {
  vector<lower = 0>[K] tau; // scale vector for residuals
  cholesky_factor_corr[K] L_Omega; // Correlation matrix for residuals
  vector[K*K] beta_hat_location; // means of the VAR(1) parameters
  vector<lower = 0>[K*K] beta_hat_scale; // Sd's of the  AR and Cross-lagged products
  vector[K] alpha_hat_location; // means for the K DV's
  vector<lower = 0>[K] alpha_hat_scale;// sd's of the means of predictors
  cholesky_factor_corr[K] L_alpha;// Correlations between means of predictors
  cholesky_factor_corr[K*K] L_beta;// Correlations between AR and Cross-lagged products
  
  // individual-level parameters
  matrix[I,K*K] z_beta;
  matrix[I,K] z_alpha;
}

transformed parameters {
  matrix[K, K] betas[I]; // individual VAR(1) coefficient matrix
  vector[K] alpha[I]; // individual means matrix
  vector[6*M] theta[n_shards]; // 8M because two means, two AR-prs, 2 cl-pars, Omega and tau, 6 if Omega and tau fixed
 
  for(i in 1:I) { // Can't remove this loop, because I need I alpha and beta's in output
    alpha[i] = alpha_hat_location + (diag_pre_multiply(alpha_hat_scale, L_alpha)) * z_alpha[i]';
    // implies: alpha[i] ~ multi_normal(alpha_hat_location, alpha_hat_scale)
    betas[i] = to_matrix((beta_hat_location + (diag_pre_multiply(beta_hat_scale, L_beta)) * z_beta[i]'), K, K);
    }
  
   for ( i in 1:n_shards ) {
    int j = 1 + (i-1)*M;
    int k = i*M;
    
    theta[i,1:M] = to_vector(alpha[individual[j:k],1]);
    theta[i,(M+1):(2*M)] = to_vector(alpha[individual[j:k],2]);
    theta[i,((2*M)+1):(3*M)] = to_vector(betas[individual[j:k],1,1]);
    theta[i,((3*M)+1):(4*M)] = to_vector(betas[individual[j:k],K,K]);
    theta[i,((4*M)+1):(5*M)] = to_vector(betas[individual[j:k],1,K]);
    theta[i,((5*M)+1):(6*M)] = to_vector(betas[individual[j:k],K,1]);
  }
}

// The model to be estimated.
model {
  // hyperpriors
  alpha_hat_location ~ normal(4, 4); 
  alpha_hat_scale ~ cauchy(0, 2); 
  beta_hat_location ~ normal(0, .5);
  beta_hat_scale ~ cauchy(0, .5);
  L_alpha ~ lkj_corr_cholesky(3);
  L_beta ~ lkj_corr_cholesky(5); // old value 3
  
  // hierarchical priors
  // non-centered parameterization
  to_vector(z_alpha) ~ std_normal();
  to_vector(z_beta) ~ std_normal();
    
 
 // Priors for error variance; tau is in sd's  
  tau ~ normal(1, 2);
  L_Omega ~ lkj_corr_cholesky(3);
  

  {
  
  vector[K*K] beta;
  
  
  beta = to_vector(diag_pre_multiply(tau, L_Omega));
  
   target += sum(map_rect(lp_reduce, beta, theta, xr, xi));
  
  }
 
}

However, I don’t think rstan is multithreading correctly, eventhough I called Sys.setenv("STAN_NUM_THREADS"=5) before running the model.

I ran 1 chain with 20 threads, and 4 chains (on 4 cores) with 5 threads but both times only a small number of cores are actually active (4 or 5 at most when I check the 4 chain run). Although the model is already faster :). It now runs in approx. 36 minutes. Also, when I ask for results using print(estimated_model, pars = c("alpha_hat_location", "beta_hat_location")), it takes R a long time show the results.

My Makevars.win files looks like this


CXX14FLAGS=-O3 -march=native -mtune=native
CXX11FLAGS=-O3 -march=corei7 -mtune=corei7
CXX14FLAGS += -DSTAN_THREADS

Does anyone know what the issue might be?

Best,
Joran

1 Like

I also ran the model with cmdstanr and there it definitely uses all my cores and finishes in 20 minutes :).

Only problem there is, after finishing and showing the total execution time it takes a long time for the focus to return to R (as in R being ready to run new commands etc.). I need to wait for 30 minutes untill results are stored in the variable I assigned to them and I can run commands again.

So, my questions are:

  1. 6 posts back, can anyone figure out why those models using reduce_sum don’t give any speed gains? especially since map_rect does.
  2. How do I get rstan to use multithreading properly? Is there something wrong with my Makevars.win file (posted in the previous post).
  3. Any ideas about why cmdstanr takes such a long time to store the results and make R “available” again?

As always! Thanks for the help ;).

Best,
Joran

cmdstanr currently reads all the results in the memory after the sampling completes. you can turn this off if you start sampling with validate_csv = FALSE in $sample().

We are also currently still using a slow way of reading in samples. This will be improved in the near future with a faster CSV reader.

I am however not sure why it takes 30 minutes. The CSV files must be really really huge. A few thousand parameters with 2000 samples normally takes maybe 20-40 seconds on my computer. How many parameters do you have, and how many sampling iterations do you run?

Ahhhh, That explains a lot ;). Thnx! :).
And yeah….the files are huge for some reason 360 mb per chain.

Hmmmm, I think it is due to the theta array (see code above) that I send to map_rect. I define that in the transformed parameter block and so alot of extra parameter are sampled. Can I also specify those as local parameters in the model block and send them to map_rect? Or do they have to be in the transformed model block?

Best,
Joran

Hi @rok_cesnovar!

Update: If I specify theta as a local parameter, the csv-files are much smaller and everything works fine :).

Many thanks! :).
Joran

1 Like

I was wondering if anybody could tell me why rstan won’t run my map_rect model using multiple cores, while the same code does work using cmdstan.

My Makevars.win file in the /.R/ directory looks like:

CXX14FLAGS=-O3 -march=native -mtune=native
CXX11FLAGS=-O3 -march=corei7 -mtune=corei7
CXX14FLAGS += -DSTAN_THREADS

Can’t rstan find the file, or is it using a different one for example?

Best!
Joran