Help speeding up an auto-regressive multivariate measurement-error model

I am working with a large data set (160 million observations of 6 variables for 400 subjects) where conditions change over varying periods of time.

It seems like there might be too much raw data, so I collapsed the variables into means and standard deviations over each condition time interval (observations are sampled at 120Hz), which takes the data down to 68k observations. I have attached an example subset.

sample_data.csv (699.5 KB)

It seemed like an AR term for “interval” made sense and I used brms to specify the model and get the generated stan code below:

functions {
 /* compute correlated group-level effects
  * Args:
  *   z: matrix of unscaled group-level effects
  *   SD: vector of standard deviation parameters
  *   L: cholesky factor correlation matrix
  * Returns:
  *   matrix of scaled group-level effects
  */
  matrix scale_r_cor(matrix z, vector SD, matrix L) {
    // r is stored in another dimension order than z
    return transpose(diag_pre_multiply(SD, L) * z);
  }
}
data {
  int<lower=1> N;  // total number of observations
  int<lower=1> N_meanX;  // number of observations
  vector[N_meanX] Y_meanX;  // response variable
  // data for measurement-error in the response
  vector<lower=0>[N_meanX] noise_meanX;
  // information about non-missings
  int<lower=0> Nme_meanX;
  int<lower=1> Jme_meanX[Nme_meanX];
  int<lower=1> K_meanX;  // number of population-level effects
  matrix[N_meanX, K_meanX] X_meanX;  // population-level design matrix
  // data needed for ARMA correlations
  int<lower=0> Kar_meanX;  // AR order
  int<lower=0> Kma_meanX;  // MA order
  // number of lags per observation
  int<lower=0> J_lag_meanX[N_meanX];
  int<lower=1> N_meanY;  // number of observations
  vector[N_meanY] Y_meanY;  // response variable
  // data for measurement-error in the response
  vector<lower=0>[N_meanY] noise_meanY;
  // information about non-missings
  int<lower=0> Nme_meanY;
  int<lower=1> Jme_meanY[Nme_meanY];
  int<lower=1> K_meanY;  // number of population-level effects
  matrix[N_meanY, K_meanY] X_meanY;  // population-level design matrix
  // data needed for ARMA correlations
  int<lower=0> Kar_meanY;  // AR order
  int<lower=0> Kma_meanY;  // MA order
  // number of lags per observation
  int<lower=0> J_lag_meanY[N_meanY];
  int<lower=1> N_meanZ;  // number of observations
  vector[N_meanZ] Y_meanZ;  // response variable
  // data for measurement-error in the response
  vector<lower=0>[N_meanZ] noise_meanZ;
  // information about non-missings
  int<lower=0> Nme_meanZ;
  int<lower=1> Jme_meanZ[Nme_meanZ];
  int<lower=1> K_meanZ;  // number of population-level effects
  matrix[N_meanZ, K_meanZ] X_meanZ;  // population-level design matrix
  // data needed for ARMA correlations
  int<lower=0> Kar_meanZ;  // AR order
  int<lower=0> Kma_meanZ;  // MA order
  // number of lags per observation
  int<lower=0> J_lag_meanZ[N_meanZ];
  int<lower=1> N_meanP;  // number of observations
  vector[N_meanP] Y_meanP;  // response variable
  // data for measurement-error in the response
  vector<lower=0>[N_meanP] noise_meanP;
  // information about non-missings
  int<lower=0> Nme_meanP;
  int<lower=1> Jme_meanP[Nme_meanP];
  int<lower=1> K_meanP;  // number of population-level effects
  matrix[N_meanP, K_meanP] X_meanP;  // population-level design matrix
  // data needed for ARMA correlations
  int<lower=0> Kar_meanP;  // AR order
  int<lower=0> Kma_meanP;  // MA order
  // number of lags per observation
  int<lower=0> J_lag_meanP[N_meanP];
  int<lower=1> N_meanR;  // number of observations
  vector[N_meanR] Y_meanR;  // response variable
  // data for measurement-error in the response
  vector<lower=0>[N_meanR] noise_meanR;
  // information about non-missings
  int<lower=0> Nme_meanR;
  int<lower=1> Jme_meanR[Nme_meanR];
  int<lower=1> K_meanR;  // number of population-level effects
  matrix[N_meanR, K_meanR] X_meanR;  // population-level design matrix
  // data needed for ARMA correlations
  int<lower=0> Kar_meanR;  // AR order
  int<lower=0> Kma_meanR;  // MA order
  // number of lags per observation
  int<lower=0> J_lag_meanR[N_meanR];
  int<lower=1> N_meanW;  // number of observations
  vector[N_meanW] Y_meanW;  // response variable
  // data for measurement-error in the response
  vector<lower=0>[N_meanW] noise_meanW;
  // information about non-missings
  int<lower=0> Nme_meanW;
  int<lower=1> Jme_meanW[Nme_meanW];
  int<lower=1> K_meanW;  // number of population-level effects
  matrix[N_meanW, K_meanW] X_meanW;  // population-level design matrix
  // data needed for ARMA correlations
  int<lower=0> Kar_meanW;  // AR order
  int<lower=0> Kma_meanW;  // MA order
  // number of lags per observation
  int<lower=0> J_lag_meanW[N_meanW];
  int<lower=1> nresp;  // number of responses
  int nrescor;  // number of residual correlations
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1_meanX[N_meanX];  // grouping indicator per observation
  int<lower=1> J_1_meanY[N_meanY];  // grouping indicator per observation
  int<lower=1> J_1_meanZ[N_meanZ];  // grouping indicator per observation
  int<lower=1> J_1_meanP[N_meanP];  // grouping indicator per observation
  int<lower=1> J_1_meanR[N_meanR];  // grouping indicator per observation
  int<lower=1> J_1_meanW[N_meanW];  // grouping indicator per observation
  // group-level predictor values
  vector[N_meanX] Z_1_meanX_1;
  vector[N_meanY] Z_1_meanY_2;
  vector[N_meanZ] Z_1_meanZ_3;
  vector[N_meanP] Z_1_meanP_4;
  vector[N_meanR] Z_1_meanR_5;
  vector[N_meanW] Z_1_meanW_6;
  int<lower=1> NC_1;  // number of group-level correlations
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2_meanX[N_meanX];  // grouping indicator per observation
  int<lower=1> J_2_meanY[N_meanY];  // grouping indicator per observation
  int<lower=1> J_2_meanZ[N_meanZ];  // grouping indicator per observation
  int<lower=1> J_2_meanP[N_meanP];  // grouping indicator per observation
  int<lower=1> J_2_meanR[N_meanR];  // grouping indicator per observation
  int<lower=1> J_2_meanW[N_meanW];  // grouping indicator per observation
  // group-level predictor values
  vector[N_meanX] Z_2_meanX_1;
  vector[N_meanY] Z_2_meanY_2;
  vector[N_meanZ] Z_2_meanZ_3;
  vector[N_meanP] Z_2_meanP_4;
  vector[N_meanR] Z_2_meanR_5;
  vector[N_meanW] Z_2_meanW_6;
  int<lower=1> NC_2;  // number of group-level correlations
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc_meanX = K_meanX - 1;
  matrix[N_meanX, Kc_meanX] Xc_meanX;  // centered version of X_meanX without an intercept
  vector[Kc_meanX] means_X_meanX;  // column means of X_meanX before centering
  int max_lag_meanX = max(Kar_meanX, Kma_meanX);
  int Kc_meanY = K_meanY - 1;
  matrix[N_meanY, Kc_meanY] Xc_meanY;  // centered version of X_meanY without an intercept
  vector[Kc_meanY] means_X_meanY;  // column means of X_meanY before centering
  int max_lag_meanY = max(Kar_meanY, Kma_meanY);
  int Kc_meanZ = K_meanZ - 1;
  matrix[N_meanZ, Kc_meanZ] Xc_meanZ;  // centered version of X_meanZ without an intercept
  vector[Kc_meanZ] means_X_meanZ;  // column means of X_meanZ before centering
  int max_lag_meanZ = max(Kar_meanZ, Kma_meanZ);
  int Kc_meanP = K_meanP - 1;
  matrix[N_meanP, Kc_meanP] Xc_meanP;  // centered version of X_meanP without an intercept
  vector[Kc_meanP] means_X_meanP;  // column means of X_meanP before centering
  int max_lag_meanP = max(Kar_meanP, Kma_meanP);
  int Kc_meanR = K_meanR - 1;
  matrix[N_meanR, Kc_meanR] Xc_meanR;  // centered version of X_meanR without an intercept
  vector[Kc_meanR] means_X_meanR;  // column means of X_meanR before centering
  int max_lag_meanR = max(Kar_meanR, Kma_meanR);
  int Kc_meanW = K_meanW - 1;
  matrix[N_meanW, Kc_meanW] Xc_meanW;  // centered version of X_meanW without an intercept
  vector[Kc_meanW] means_X_meanW;  // column means of X_meanW before centering
  int max_lag_meanW = max(Kar_meanW, Kma_meanW);
  vector[nresp] Y[N];  // response array
  for (i in 2:K_meanX) {
    means_X_meanX[i - 1] = mean(X_meanX[, i]);
    Xc_meanX[, i - 1] = X_meanX[, i] - means_X_meanX[i - 1];
  }
  for (i in 2:K_meanY) {
    means_X_meanY[i - 1] = mean(X_meanY[, i]);
    Xc_meanY[, i - 1] = X_meanY[, i] - means_X_meanY[i - 1];
  }
  for (i in 2:K_meanZ) {
    means_X_meanZ[i - 1] = mean(X_meanZ[, i]);
    Xc_meanZ[, i - 1] = X_meanZ[, i] - means_X_meanZ[i - 1];
  }
  for (i in 2:K_meanP) {
    means_X_meanP[i - 1] = mean(X_meanP[, i]);
    Xc_meanP[, i - 1] = X_meanP[, i] - means_X_meanP[i - 1];
  }
  for (i in 2:K_meanR) {
    means_X_meanR[i - 1] = mean(X_meanR[, i]);
    Xc_meanR[, i - 1] = X_meanR[, i] - means_X_meanR[i - 1];
  }
  for (i in 2:K_meanW) {
    means_X_meanW[i - 1] = mean(X_meanW[, i]);
    Xc_meanW[, i - 1] = X_meanW[, i] - means_X_meanW[i - 1];
  }
  for (n in 1:N) {
    Y[n] = transpose([Y_meanX[n], Y_meanY[n], Y_meanZ[n], Y_meanP[n], Y_meanR[n], Y_meanW[n]]);
  }
}
parameters {
  vector[N_meanX] Yl_meanX;  // latent variable
  vector[Kc_meanX] b_meanX;  // population-level effects
  real Intercept_meanX;  // temporary intercept for centered predictors
  vector[Kar_meanX] ar_meanX;  // autoregressive coefficients
  real<lower=0> sigma_meanX;  // dispersion parameter
  vector[N_meanY] Yl_meanY;  // latent variable
  vector[Kc_meanY] b_meanY;  // population-level effects
  real Intercept_meanY;  // temporary intercept for centered predictors
  vector[Kar_meanY] ar_meanY;  // autoregressive coefficients
  real<lower=0> sigma_meanY;  // dispersion parameter
  vector[N_meanZ] Yl_meanZ;  // latent variable
  vector[Kc_meanZ] b_meanZ;  // population-level effects
  real Intercept_meanZ;  // temporary intercept for centered predictors
  vector[Kar_meanZ] ar_meanZ;  // autoregressive coefficients
  real<lower=0> sigma_meanZ;  // dispersion parameter
  vector[N_meanP] Yl_meanP;  // latent variable
  vector[Kc_meanP] b_meanP;  // population-level effects
  real Intercept_meanP;  // temporary intercept for centered predictors
  vector[Kar_meanP] ar_meanP;  // autoregressive coefficients
  real<lower=0> sigma_meanP;  // dispersion parameter
  vector[N_meanR] Yl_meanR;  // latent variable
  vector[Kc_meanR] b_meanR;  // population-level effects
  real Intercept_meanR;  // temporary intercept for centered predictors
  vector[Kar_meanR] ar_meanR;  // autoregressive coefficients
  real<lower=0> sigma_meanR;  // dispersion parameter
  vector[N_meanW] Yl_meanW;  // latent variable
  vector[Kc_meanW] b_meanW;  // population-level effects
  real Intercept_meanW;  // temporary intercept for centered predictors
  vector[Kar_meanW] ar_meanW;  // autoregressive coefficients
  real<lower=0> sigma_meanW;  // dispersion parameter
  cholesky_factor_corr[nresp] Lrescor;  // parameters for multivariate linear models
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  matrix[M_2, N_2] z_2;  // standardized group-level effects
  cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
}
transformed parameters {
  matrix[N_1, M_1] r_1;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_meanX_1;
  vector[N_1] r_1_meanY_2;
  vector[N_1] r_1_meanZ_3;
  vector[N_1] r_1_meanP_4;
  vector[N_1] r_1_meanR_5;
  vector[N_1] r_1_meanW_6;
  matrix[N_2, M_2] r_2;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_2] r_2_meanX_1;
  vector[N_2] r_2_meanY_2;
  vector[N_2] r_2_meanZ_3;
  vector[N_2] r_2_meanP_4;
  vector[N_2] r_2_meanR_5;
  vector[N_2] r_2_meanW_6;
  real lprior = 0;  // prior contributions to the log posterior
  // compute actual group-level effects
  r_1 = scale_r_cor(z_1, sd_1, L_1);
  r_1_meanX_1 = r_1[, 1];
  r_1_meanY_2 = r_1[, 2];
  r_1_meanZ_3 = r_1[, 3];
  r_1_meanP_4 = r_1[, 4];
  r_1_meanR_5 = r_1[, 5];
  r_1_meanW_6 = r_1[, 6];
  // compute actual group-level effects
  r_2 = scale_r_cor(z_2, sd_2, L_2);
  r_2_meanX_1 = r_2[, 1];
  r_2_meanY_2 = r_2[, 2];
  r_2_meanZ_3 = r_2[, 3];
  r_2_meanP_4 = r_2[, 4];
  r_2_meanR_5 = r_2[, 5];
  r_2_meanW_6 = r_2[, 6];
  lprior += normal_lpdf(b_meanX | 0,2);
  lprior += normal_lpdf(Intercept_meanX | 0,0.5);
  lprior += normal_lpdf(ar_meanX | 0,1);
  lprior += normal_lpdf(sigma_meanX | 0,2)
    - 1 * normal_lccdf(0 | 0,2);
  lprior += normal_lpdf(b_meanY | 0,2);
  lprior += normal_lpdf(Intercept_meanY | 0,0.5);
  lprior += normal_lpdf(ar_meanY | 0,1);
  lprior += normal_lpdf(sigma_meanY | 0,2)
    - 1 * normal_lccdf(0 | 0,2);
  lprior += normal_lpdf(b_meanZ | 0,2);
  lprior += normal_lpdf(Intercept_meanZ | 0,0.5);
  lprior += normal_lpdf(ar_meanZ | 0,1);
  lprior += normal_lpdf(sigma_meanZ | 0,2)
    - 1 * normal_lccdf(0 | 0,2);
  lprior += normal_lpdf(b_meanP | 0,2);
  lprior += normal_lpdf(Intercept_meanP | 0,0.5);
  lprior += normal_lpdf(ar_meanP | 0,1);
  lprior += normal_lpdf(sigma_meanP | 0,2)
    - 1 * normal_lccdf(0 | 0,2);
  lprior += normal_lpdf(b_meanR | 0,2);
  lprior += normal_lpdf(Intercept_meanR | 0,0.5);
  lprior += normal_lpdf(ar_meanR | 0,1);
  lprior += normal_lpdf(sigma_meanR | 0,2)
    - 1 * normal_lccdf(0 | 0,2);
  lprior += normal_lpdf(b_meanW | 0,2);
  lprior += normal_lpdf(Intercept_meanW | 0,0.5);
  lprior += normal_lpdf(ar_meanW | 0,1);
  lprior += normal_lpdf(sigma_meanW | 0,2)
    - 1 * normal_lccdf(0 | 0,2);
  lprior += lkj_corr_cholesky_lpdf(Lrescor | 1);
  lprior += normal_lpdf(sd_1 | 0,1)
    - 6 * normal_lccdf(0 | 0,1);
  lprior += lkj_corr_cholesky_lpdf(L_1 | 1);
  lprior += normal_lpdf(sd_2 | 0,1)
    - 6 * normal_lccdf(0 | 0,1);
  lprior += lkj_corr_cholesky_lpdf(L_2 | 1);
}
model {
  // likelihood including constants
  if (!prior_only) {
    vector[nresp] Yl[N] = Y;
    // matrix storing lagged residuals
    matrix[N_meanX, max_lag_meanX] Err_meanX = rep_matrix(0, N_meanX, max_lag_meanX);
    vector[N_meanX] err_meanX;  // actual residuals
    // initialize linear predictor term
    vector[N_meanX] mu_meanX = Intercept_meanX + Xc_meanX * b_meanX;
    // matrix storing lagged residuals
    matrix[N_meanY, max_lag_meanY] Err_meanY = rep_matrix(0, N_meanY, max_lag_meanY);
    vector[N_meanY] err_meanY;  // actual residuals
    // initialize linear predictor term
    vector[N_meanY] mu_meanY = Intercept_meanY + Xc_meanY * b_meanY;
    // matrix storing lagged residuals
    matrix[N_meanZ, max_lag_meanZ] Err_meanZ = rep_matrix(0, N_meanZ, max_lag_meanZ);
    vector[N_meanZ] err_meanZ;  // actual residuals
    // initialize linear predictor term
    vector[N_meanZ] mu_meanZ = Intercept_meanZ + Xc_meanZ * b_meanZ;
    // matrix storing lagged residuals
    matrix[N_meanP, max_lag_meanP] Err_meanP = rep_matrix(0, N_meanP, max_lag_meanP);
    vector[N_meanP] err_meanP;  // actual residuals
    // initialize linear predictor term
    vector[N_meanP] mu_meanP = Intercept_meanP + Xc_meanP * b_meanP;
    // matrix storing lagged residuals
    matrix[N_meanR, max_lag_meanR] Err_meanR = rep_matrix(0, N_meanR, max_lag_meanR);
    vector[N_meanR] err_meanR;  // actual residuals
    // initialize linear predictor term
    vector[N_meanR] mu_meanR = Intercept_meanR + Xc_meanR * b_meanR;
    // matrix storing lagged residuals
    matrix[N_meanW, max_lag_meanW] Err_meanW = rep_matrix(0, N_meanW, max_lag_meanW);
    vector[N_meanW] err_meanW;  // actual residuals
    // initialize linear predictor term
    vector[N_meanW] mu_meanW = Intercept_meanW + Xc_meanW * b_meanW;
    // multivariate predictor array
    vector[nresp] Mu[N];
    vector[nresp] sigma = transpose([sigma_meanX, sigma_meanY, sigma_meanZ, sigma_meanP, sigma_meanR, sigma_meanW]);
    // cholesky factor of residual covariance matrix
    matrix[nresp, nresp] LSigma = diag_pre_multiply(sigma, Lrescor);
    for (n in 1:N_meanX) {
      // add more terms to the linear predictor
      mu_meanX[n] += r_1_meanX_1[J_1_meanX[n]] * Z_1_meanX_1[n] + r_2_meanX_1[J_2_meanX[n]] * Z_2_meanX_1[n];
    }
    for (n in 1:N_meanY) {
      // add more terms to the linear predictor
      mu_meanY[n] += r_1_meanY_2[J_1_meanY[n]] * Z_1_meanY_2[n] + r_2_meanY_2[J_2_meanY[n]] * Z_2_meanY_2[n];
    }
    for (n in 1:N_meanZ) {
      // add more terms to the linear predictor
      mu_meanZ[n] += r_1_meanZ_3[J_1_meanZ[n]] * Z_1_meanZ_3[n] + r_2_meanZ_3[J_2_meanZ[n]] * Z_2_meanZ_3[n];
    }
    for (n in 1:N_meanP) {
      // add more terms to the linear predictor
      mu_meanP[n] += r_1_meanP_4[J_1_meanP[n]] * Z_1_meanP_4[n] + r_2_meanP_4[J_2_meanP[n]] * Z_2_meanP_4[n];
    }
    for (n in 1:N_meanR) {
      // add more terms to the linear predictor
      mu_meanR[n] += r_1_meanR_5[J_1_meanR[n]] * Z_1_meanR_5[n] + r_2_meanR_5[J_2_meanR[n]] * Z_2_meanR_5[n];
    }
    for (n in 1:N_meanW) {
      // add more terms to the linear predictor
      mu_meanW[n] += r_1_meanW_6[J_1_meanW[n]] * Z_1_meanW_6[n] + r_2_meanW_6[J_2_meanW[n]] * Z_2_meanW_6[n];
    }
    // include ARMA terms
    for (n in 1:N_meanX) {
      err_meanX[n] = Yl_meanX[n] - mu_meanX[n];
      for (i in 1:J_lag_meanX[n]) {
        Err_meanX[n + 1, i] = err_meanX[n + 1 - i];
      }
      mu_meanX[n] += Err_meanX[n, 1:Kar_meanX] * ar_meanX;
    }
    // include ARMA terms
    for (n in 1:N_meanY) {
      err_meanY[n] = Yl_meanY[n] - mu_meanY[n];
      for (i in 1:J_lag_meanY[n]) {
        Err_meanY[n + 1, i] = err_meanY[n + 1 - i];
      }
      mu_meanY[n] += Err_meanY[n, 1:Kar_meanY] * ar_meanY;
    }
    // include ARMA terms
    for (n in 1:N_meanZ) {
      err_meanZ[n] = Yl_meanZ[n] - mu_meanZ[n];
      for (i in 1:J_lag_meanZ[n]) {
        Err_meanZ[n + 1, i] = err_meanZ[n + 1 - i];
      }
      mu_meanZ[n] += Err_meanZ[n, 1:Kar_meanZ] * ar_meanZ;
    }
    // include ARMA terms
    for (n in 1:N_meanP) {
      err_meanP[n] = Yl_meanP[n] - mu_meanP[n];
      for (i in 1:J_lag_meanP[n]) {
        Err_meanP[n + 1, i] = err_meanP[n + 1 - i];
      }
      mu_meanP[n] += Err_meanP[n, 1:Kar_meanP] * ar_meanP;
    }
    // include ARMA terms
    for (n in 1:N_meanR) {
      err_meanR[n] = Yl_meanR[n] - mu_meanR[n];
      for (i in 1:J_lag_meanR[n]) {
        Err_meanR[n + 1, i] = err_meanR[n + 1 - i];
      }
      mu_meanR[n] += Err_meanR[n, 1:Kar_meanR] * ar_meanR;
    }
    // include ARMA terms
    for (n in 1:N_meanW) {
      err_meanW[n] = Yl_meanW[n] - mu_meanW[n];
      for (i in 1:J_lag_meanW[n]) {
        Err_meanW[n + 1, i] = err_meanW[n + 1 - i];
      }
      mu_meanW[n] += Err_meanW[n, 1:Kar_meanW] * ar_meanW;
    }
    // combine univariate parameters
    for (n in 1:N) {
      Yl[n][1] = Yl_meanX[n];
      Yl[n][2] = Yl_meanY[n];
      Yl[n][3] = Yl_meanZ[n];
      Yl[n][4] = Yl_meanP[n];
      Yl[n][5] = Yl_meanR[n];
      Yl[n][6] = Yl_meanW[n];
    }
    // combine univariate parameters
    for (n in 1:N) {
      Mu[n] = transpose([mu_meanX[n], mu_meanY[n], mu_meanZ[n], mu_meanP[n], mu_meanR[n], mu_meanW[n]]);
    }
    target += multi_normal_cholesky_lpdf(Yl | Mu, LSigma);
  }
  // priors including constants
  target += lprior;
  target += normal_lpdf(Y_meanX[Jme_meanX] | Yl_meanX[Jme_meanX], noise_meanX[Jme_meanX]);
  target += normal_lpdf(Y_meanY[Jme_meanY] | Yl_meanY[Jme_meanY], noise_meanY[Jme_meanY]);
  target += normal_lpdf(Y_meanZ[Jme_meanZ] | Yl_meanZ[Jme_meanZ], noise_meanZ[Jme_meanZ]);
  target += normal_lpdf(Y_meanP[Jme_meanP] | Yl_meanP[Jme_meanP], noise_meanP[Jme_meanP]);
  target += normal_lpdf(Y_meanR[Jme_meanR] | Yl_meanR[Jme_meanR], noise_meanR[Jme_meanR]);
  target += normal_lpdf(Y_meanW[Jme_meanW] | Yl_meanW[Jme_meanW], noise_meanW[Jme_meanW]);
  target += std_normal_lpdf(to_vector(z_1));
  target += std_normal_lpdf(to_vector(z_2));
}
generated quantities {
  // actual population-level intercept
  real b_meanX_Intercept = Intercept_meanX - dot_product(means_X_meanX, b_meanX);
  // actual population-level intercept
  real b_meanY_Intercept = Intercept_meanY - dot_product(means_X_meanY, b_meanY);
  // actual population-level intercept
  real b_meanZ_Intercept = Intercept_meanZ - dot_product(means_X_meanZ, b_meanZ);
  // actual population-level intercept
  real b_meanP_Intercept = Intercept_meanP - dot_product(means_X_meanP, b_meanP);
  // actual population-level intercept
  real b_meanR_Intercept = Intercept_meanR - dot_product(means_X_meanR, b_meanR);
  // actual population-level intercept
  real b_meanW_Intercept = Intercept_meanW - dot_product(means_X_meanW, b_meanW);
  // residual correlations
  corr_matrix[nresp] Rescor = multiply_lower_tri_self_transpose(Lrescor);
  vector<lower=-1,upper=1>[nrescor] rescor;
  // compute group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // compute group-level correlations
  corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
  vector<lower=-1,upper=1>[NC_2] cor_2;
  // extract upper diagonal of correlation matrix
  for (k in 1:nresp) {
    for (j in 1:(k - 1)) {
      rescor[choose(k - 1, 2) + j] = Rescor[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_2) {
    for (j in 1:(k - 1)) {
      cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
    }
  }
}



A small sample of the data is taking hours to run, so I’m wondering where I might be able to gain some efficiency through some vectorization and if there might be some changes I could make to take advantage of a fairly capable GPU.