Speeding up a Student model with 3 correlation matrices

JLC · February 5, 2021, 2:33am

I have a relatively large data set (~9000 observations) and am fitting a hierarchical Student model with several correlations. The model initially took a little over a week for 4 chains of 6500 iterations. I’m attempting to speed things up with some efficiency if possible. I used brms to generate some initial stan code and have done some vectorization.

brms syntax:

Mod_fmla <- bf(RECD ~ 0 + Intercept + 
                                     AvRECD.c +
                                     Age_m.c +
                                     Pressure.c +
                                     Imped.c +
                                     Volume.c +
                                     factor(Frequency) +
                                     factor(Frequency):Pressure.c +
                                     factor(Frequency):Imped.c +
                                     factor(Frequency):Volume.c +
                                     (1 + factor(Frequency)|ME/Ear/SubjID) +
                                     (1|test_year),
                             center = TRUE) +
        lf(sigma ~ 0 + (1|ME), cmc = FALSE) +
        lf(nu ~ 0 + (1|ME), cmc = FALSE)

Stan model:

"functions {
 /* compute correlated group-level effects
  * Args: 
  *   z: matrix of unscaled group-level effects
  *   SD: vector of standard deviation parameters
  *   L: cholesky factor correlation matrix
  * Returns: 
  *   matrix of scaled group-level effects
  */ 
  matrix scale_r_cor(matrix z, vector SD, matrix L) {
    // r is stored in another dimension order than z
    return transpose(diag_pre_multiply(SD, L) * z);
  }
  /* compute the logm1 link 
   * Args: 
   *   p: a positive scalar
   * Returns: 
   *   a scalar in (-Inf, Inf)
   */ 
   real logm1(real y) { 
     return log(y - 1);
   }
  /* compute the inverse of the logm1 link 
   * Args: 
   *   y: a scalar in (-Inf, Inf)
   * Returns: 
   *   a positive scalar
   */ 
   real expp1(real y) { 
     return exp(y) + 1;
   }
}

data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  vector[N] Z_1_3;
  vector[N] Z_1_4;
  vector[N] Z_1_5;
  vector[N] Z_1_6;
  vector[N] Z_1_7;
  vector[N] Z_1_8;
  vector[N] Z_1_9;
  int<lower=1> NC_1;  // number of group-level correlations
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  vector[N] Z_2_2;
  vector[N] Z_2_3;
  vector[N] Z_2_4;
  vector[N] Z_2_5;
  vector[N] Z_2_6;
  vector[N] Z_2_7;
  vector[N] Z_2_8;
  vector[N] Z_2_9;
  int<lower=1> NC_2;  // number of group-level correlations
  // data for group-level effects of ID 3
  int<lower=1> N_3;  // number of grouping levels
  int<lower=1> M_3;  // number of coefficients per level
  int<lower=1> J_3[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_3_1;
  vector[N] Z_3_2;
  vector[N] Z_3_3;
  vector[N] Z_3_4;
  vector[N] Z_3_5;
  vector[N] Z_3_6;
  vector[N] Z_3_7;
  vector[N] Z_3_8;
  vector[N] Z_3_9;
  int<lower=1> NC_3;  // number of group-level correlations
  // data for group-level effects of ID 4
  int<lower=1> N_4;  // number of grouping levels
  int<lower=1> M_4;  // number of coefficients per level
  int<lower=1> J_4[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_4_1;
  // data for group-level effects of ID 5
  int<lower=1> N_5;  // number of grouping levels
  int<lower=1> M_5;  // number of coefficients per level
  int<lower=1> J_5[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_5_sigma_1;
  // data for group-level effects of ID 6
  int<lower=1> N_6;  // number of grouping levels
  int<lower=1> M_6;  // number of coefficients per level
  int<lower=1> J_6[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_6_nu_1;
  int prior_only;  // should the likelihood be ignored?
}

transformed data {
real norm18 = 18 * normal_lccdf(0 | 1,1);
real norm10 = 10 * normal_lccdf(0 | 0,0.5);
real norm0_1 = 1 * normal_lccdf(0 | 0,0.1);
real norm1_5 = 1 * normal_lccdf(0 | 0,1.5);
}

parameters {
  vector[K] b;  // population-level effects
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  matrix[M_2, N_2] z_2;  // standardized group-level effects
  cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
  vector<lower=0>[M_3] sd_3;  // group-level standard deviations
  matrix[M_3, N_3] z_3;  // standardized group-level effects
  cholesky_factor_corr[M_3] L_3;  // cholesky factor of correlation matrix
  vector<lower=0>[M_4] sd_4;  // group-level standard deviations
  vector[N_4] z_4[M_4];  // standardized group-level effects
  vector<lower=0>[M_5] sd_5;  // group-level standard deviations
  vector[N_5] z_5[M_5];  // standardized group-level effects
  vector<lower=0>[M_6] sd_6;  // group-level standard deviations
  vector[N_6] z_6[M_6];  // standardized group-level effects
}

transformed parameters {
  matrix[N_1, M_1] r_1;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_1;
  vector[N_1] r_1_2;
  vector[N_1] r_1_3;
  vector[N_1] r_1_4;
  vector[N_1] r_1_5;
  vector[N_1] r_1_6;
  vector[N_1] r_1_7;
  vector[N_1] r_1_8;
  vector[N_1] r_1_9;
  matrix[N_2, M_2] r_2;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_2] r_2_1;
  vector[N_2] r_2_2;
  vector[N_2] r_2_3;
  vector[N_2] r_2_4;
  vector[N_2] r_2_5;
  vector[N_2] r_2_6;
  vector[N_2] r_2_7;
  vector[N_2] r_2_8;
  vector[N_2] r_2_9;
  matrix[N_3, M_3] r_3;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_3] r_3_1;
  vector[N_3] r_3_2;
  vector[N_3] r_3_3;
  vector[N_3] r_3_4;
  vector[N_3] r_3_5;
  vector[N_3] r_3_6;
  vector[N_3] r_3_7;
  vector[N_3] r_3_8;
  vector[N_3] r_3_9;
  vector[N_4] r_4_1;  // actual group-level effects
  vector[N_5] r_5_sigma_1;  // actual group-level effects
  vector[N_6] r_6_nu_1;  // actual group-level effects
  // compute actual group-level effects
  r_1 = scale_r_cor(z_1, sd_1, L_1);
  r_1_1 = r_1[, 1];
  r_1_2 = r_1[, 2];
  r_1_3 = r_1[, 3];
  r_1_4 = r_1[, 4];
  r_1_5 = r_1[, 5];
  r_1_6 = r_1[, 6];
  r_1_7 = r_1[, 7];
  r_1_8 = r_1[, 8];
  r_1_9 = r_1[, 9];
  // compute actual group-level effects
  r_2 = scale_r_cor(z_2, sd_2, L_2);
  r_2_1 = r_2[, 1];
  r_2_2 = r_2[, 2];
  r_2_3 = r_2[, 3];
  r_2_4 = r_2[, 4];
  r_2_5 = r_2[, 5];
  r_2_6 = r_2[, 6];
  r_2_7 = r_2[, 7];
  r_2_8 = r_2[, 8];
  r_2_9 = r_2[, 9];
  // compute actual group-level effects
  r_3 = scale_r_cor(z_3, sd_3, L_3);
  r_3_1 = r_3[, 1];
  r_3_2 = r_3[, 2];
  r_3_3 = r_3[, 3];
  r_3_4 = r_3[, 4];
  r_3_5 = r_3[, 5];
  r_3_6 = r_3[, 6];
  r_3_7 = r_3[, 7];
  r_3_8 = r_3[, 8];
  r_3_9 = r_3[, 9];
  r_4_1 = (sd_4[1] * (z_4[1]));
  r_5_sigma_1 = (sd_5[1] * (z_5[1]));
  r_6_nu_1 = (sd_6[1] * (z_6[1]));
}

model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = X * b;
    // initialize linear predictor term
    vector[N] sigma = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nu = rep_vector(0.0, N);
    //for (n in 1:N) {
      // add more terms to the linear predictor
      mu += r_1_1[J_1] .* Z_1_1 + r_1_2[J_1] .* Z_1_2 + r_1_3[J_1] .* Z_1_3 + r_1_4[J_1] .* Z_1_4 + r_1_5[J_1] .* Z_1_5 
      + r_1_6[J_1] .* Z_1_6 + r_1_7[J_1] .* Z_1_7 + r_1_8[J_1] .* Z_1_8 + r_1_9[J_1] .* Z_1_9 
      + r_2_1[J_2] .* Z_2_1 + r_2_2[J_2] .* Z_2_2 + r_2_3[J_2] .* Z_2_3 + r_2_4[J_2] .* Z_2_4 
      + r_2_5[J_2] .* Z_2_5 + r_2_6[J_2] .* Z_2_6 + r_2_7[J_2] .* Z_2_7 + r_2_8[J_2] .* Z_2_8 
      + r_2_9[J_2] .* Z_2_9 + r_3_1[J_3] .* Z_3_1 + r_3_2[J_3] .* Z_3_2 + r_3_3[J_3] .* Z_3_3 
      + r_3_4[J_3] .* Z_3_4 + r_3_5[J_3] .* Z_3_5 + r_3_6[J_3] .* Z_3_6 + r_3_7[J_3] .* Z_3_7 
      + r_3_8[J_3] .* Z_3_8 + r_3_9[J_3] .* Z_3_9 + r_4_1[J_4] .* Z_4_1;
   // }
    //for (n in 1:N) {
      // add more terms to the linear predictor
      sigma += r_5_sigma_1[J_5] .* Z_5_sigma_1;
    //}
    //for (n in 1:N) {
      // add more terms to the linear predictor
      nu += r_6_nu_1[J_6] .* Z_6_nu_1;
    //}
    //for (n in 1:N) {
      // apply the inverse link function
      sigma = exp(sigma);
   // }
     for (n in 1:N) {
      // apply the inverse link function
      nu[n] = expp1(nu[n]);
    }
    target += -norm18;
    target += -norm10;
    target += -norm0_1;
    target += -norm1_5;
    target += student_t_lpdf(Y | nu, mu, sigma);
  }

Can nu be vectorized here or can I create a vector when calling the distribution?

        // priors including constants
        b[1] ~ normal(5,5);
        b[2] ~ normal(0,3);
        b[3] ~ normal(0,3);
        b[4] ~ normal(0,3);
        b[5] ~ normal(0,3);
        b[6] ~ normal(0,3);
        b[7] ~ normal(0,3);
        b[8] ~ normal(0,3);
        b[9] ~ normal(0,3);
        b[10] ~ normal(0,3);
        b[11] ~ normal(0,3);
        b[12] ~ normal(0,3);
        b[13] ~ normal(0,3);
        b[14] ~ normal(0,3);
        b[15] ~ normal(0,3);
        b[16] ~ normal(0,3);
        b[17] ~ normal(0,3);
        b[18] ~ normal(0,3);
        b[19] ~ normal(0,3);
        b[20] ~ normal(0,3);
        b[21] ~ normal(0,3);
        b[22] ~ normal(0,3);
        b[23] ~ normal(0,3);
        b[24] ~ normal(0,3);
        b[25] ~ normal(0,3);
        b[26] ~ normal(0,3);
        b[27] ~ normal(0,3);
        b[28] ~ normal(0,3);
        b[29] ~ normal(0,3);
        b[30] ~ normal(0,3);
        b[31] ~ normal(0,3);
        b[32] ~ normal(0,3);
        b[33] ~ normal(0,3);
        b[34] ~ normal(0,3);
        b[35] ~ normal(0,3);
        b[36] ~ normal(0,3);
        b[37] ~ normal(0,3);
        b[38] ~ normal(0,3);
        //target += normal_lpdf(sd_1 | 1,1)
        //- 9 * normal_lccdf(0 | 1,1);
        sd_1 ~ normal(1,1);
        z_1[1] ~ std_normal();
        z_1[2] ~ std_normal();
        z_1[3] ~ std_normal();
        z_1[4] ~ std_normal();
        z_1[5] ~ std_normal();
        z_1[6] ~ std_normal();
        z_1[7] ~ std_normal();
        z_1[8] ~ std_normal();
        z_1[9] ~ std_normal();
        L_1 ~ lkj_corr_cholesky(1);
        sd_2 ~ normal(1,1);
        z_2[1] ~ std_normal();
        z_2[2] ~ std_normal();
        z_2[3] ~ std_normal();
        z_2[4] ~ std_normal();
        z_2[5] ~ std_normal();
        z_2[6] ~ std_normal();
        z_2[7] ~ std_normal();
        z_2[8] ~ std_normal();
        z_2[9] ~ std_normal();
        L_2 ~ lkj_corr_cholesky(1);
        sd_3 ~ normal(0,0.5);
        z_3[1] ~ std_normal();
        z_3[2] ~ std_normal();
        z_3[3] ~ std_normal();
        z_3[4] ~ std_normal();
        z_3[5] ~ std_normal();
        z_3[6] ~ std_normal();
        z_3[7] ~ std_normal();
        z_3[8] ~ std_normal();
        z_3[9] ~ std_normal();
        L_3 ~ lkj_corr_cholesky(1);
        sd_4[1] ~ normal(0,0.5);
        z_4[1] ~ std_normal();
        sd_5[1] ~ normal(0,0.1);
        z_5[1] ~ std_normal();
        sd_5[1] ~ normal(0,1.5);
        z_6[1] ~ std_normal();
}

I think the ~ syntax for priors helps.

generated quantities {
  // compute group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // compute group-level correlations
  corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
  vector<lower=-1,upper=1>[NC_2] cor_2;
  // compute group-level correlations
  corr_matrix[M_3] Cor_3 = multiply_lower_tri_self_transpose(L_3);
  vector<lower=-1,upper=1>[NC_3] cor_3;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_2) {
    for (j in 1:(k - 1)) {
      cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_3) {
    for (j in 1:(k - 1)) {
      cor_3[choose(k - 1, 2) + j] = Cor_3[j, k];
    }
  }
}

Are there any additional places some vectorization might help? Can the correlation matrices in the generated quantities be vectorized?

andrjohns · February 5, 2021, 3:12am

I’d actually go back to the brms models first. If you have cmdstanR installed, and are using the github version of brms, then you can run your brms models with both multi-threading (reduce_sum) and dropping normalising constants (~ syntax). Note that if you only have four CPU cores and are running four chains, then multithreading will likely provide little benefit.

More information on multithreading is in this vignette: Running brms models with within-chain parallelization

And to drop normalising constants, you just need to specify normalize=FALSE in the brm call

JLC · February 13, 2021, 2:00pm

Thank you @andrjohns !

I had not yet experimented with cmdstan, but it definitely sped things up by days!

I am trying to take it one step further by vectorizing the cmdstan model:

functions {
  /* compute correlated group-level effects
  * Args: 
    *   z: matrix of unscaled group-level effects
  *   SD: vector of standard deviation parameters
  *   L: cholesky factor correlation matrix
  * Returns: 
    *   matrix of scaled group-level effects
  */ 
    matrix scale_r_cor(matrix z, vector SD, matrix L) {
      // r is stored in another dimension order than z
      return transpose(diag_pre_multiply(SD, L) * z);
    }
  /* compute the logm1 link 
  * Args: 
    *   p: a positive scalar
  * Returns: 
    *   a scalar in (-Inf, Inf)
  */ 
    real logm1(real y) { 
      return log(y - 1);
    }
  /* compute the inverse of the logm1 link 
  * Args: 
    *   y: a scalar in (-Inf, Inf)
  * Returns: 
    *   a positive scalar
  */ 
    real expp1(real y) { 
      return exp(y) + 1;
    }
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  vector[N] Z_1_3;
  vector[N] Z_1_4;
  vector[N] Z_1_5;
  vector[N] Z_1_6;
  vector[N] Z_1_7;
  vector[N] Z_1_8;
  vector[N] Z_1_9;
  int<lower=1> NC_1;  // number of group-level correlations
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  vector[N] Z_2_2;
  vector[N] Z_2_3;
  vector[N] Z_2_4;
  vector[N] Z_2_5;
  vector[N] Z_2_6;
  vector[N] Z_2_7;
  vector[N] Z_2_8;
  vector[N] Z_2_9;
  int<lower=1> NC_2;  // number of group-level correlations
  // data for group-level effects of ID 3
  int<lower=1> N_3;  // number of grouping levels
  int<lower=1> M_3;  // number of coefficients per level
  int<lower=1> J_3[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_3_1;
  vector[N] Z_3_2;
  vector[N] Z_3_3;
  vector[N] Z_3_4;
  vector[N] Z_3_5;
  vector[N] Z_3_6;
  vector[N] Z_3_7;
  vector[N] Z_3_8;
  vector[N] Z_3_9;
  int<lower=1> NC_3;  // number of group-level correlations
  // data for group-level effects of ID 4
  int<lower=1> N_4;  // number of grouping levels
  int<lower=1> M_4;  // number of coefficients per level
  int<lower=1> J_4[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_4_1;
  // data for group-level effects of ID 5
  int<lower=1> N_5;  // number of grouping levels
  int<lower=1> M_5;  // number of coefficients per level
  int<lower=1> J_5[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_5_sigma_1;
  // data for group-level effects of ID 6
  int<lower=1> N_6;  // number of grouping levels
  int<lower=1> M_6;  // number of coefficients per level
  int<lower=1> J_6[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_6_nu_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
real norm18 = 18 * normal_lccdf(0 | 1,1);
real norm10 = 10 * normal_lccdf(0 | 0,0.5);
real norm0_1 = 1 * normal_lccdf(0 | 0,0.1);
real norm1_5 = 1 * normal_lccdf(0 | 0,1.5);
}
parameters {
  vector[K] b;  // population-level effects
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  matrix[M_2, N_2] z_2;  // standardized group-level effects
  cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
  vector<lower=0>[M_3] sd_3;  // group-level standard deviations
  matrix[M_3, N_3] z_3;  // standardized group-level effects
  cholesky_factor_corr[M_3] L_3;  // cholesky factor of correlation matrix
  vector<lower=0>[M_4] sd_4;  // group-level standard deviations
  vector[N_4] z_4[M_4];  // standardized group-level effects
  vector<lower=0>[M_5] sd_5;  // group-level standard deviations
  vector[N_5] z_5[M_5];  // standardized group-level effects
  vector<lower=0>[M_6] sd_6;  // group-level standard deviations
  vector[N_6] z_6[M_6];  // standardized group-level effects
}
transformed parameters {
  matrix[N_1, M_1] r_1;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_1;
  vector[N_1] r_1_2;
  vector[N_1] r_1_3;
  vector[N_1] r_1_4;
  vector[N_1] r_1_5;
  vector[N_1] r_1_6;
  vector[N_1] r_1_7;
  vector[N_1] r_1_8;
  vector[N_1] r_1_9;
  matrix[N_2, M_2] r_2;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_2] r_2_1;
  vector[N_2] r_2_2;
  vector[N_2] r_2_3;
  vector[N_2] r_2_4;
  vector[N_2] r_2_5;
  vector[N_2] r_2_6;
  vector[N_2] r_2_7;
  vector[N_2] r_2_8;
  vector[N_2] r_2_9;
  matrix[N_3, M_3] r_3;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_3] r_3_1;
  vector[N_3] r_3_2;
  vector[N_3] r_3_3;
  vector[N_3] r_3_4;
  vector[N_3] r_3_5;
  vector[N_3] r_3_6;
  vector[N_3] r_3_7;
  vector[N_3] r_3_8;
  vector[N_3] r_3_9;
  vector[N_4] r_4_1;  // actual group-level effects
  vector[N_5] r_5_sigma_1;  // actual group-level effects
  vector[N_6] r_6_nu_1;  // actual group-level effects
  // compute actual group-level effects
  r_1 = scale_r_cor(z_1, sd_1, L_1);
  r_1_1 = r_1[, 1];
  r_1_2 = r_1[, 2];
  r_1_3 = r_1[, 3];
  r_1_4 = r_1[, 4];
  r_1_5 = r_1[, 5];
  r_1_6 = r_1[, 6];
  r_1_7 = r_1[, 7];
  r_1_8 = r_1[, 8];
  r_1_9 = r_1[, 9];
  // compute actual group-level effects
  r_2 = scale_r_cor(z_2, sd_2, L_2);
  r_2_1 = r_2[, 1];
  r_2_2 = r_2[, 2];
  r_2_3 = r_2[, 3];
  r_2_4 = r_2[, 4];
  r_2_5 = r_2[, 5];
  r_2_6 = r_2[, 6];
  r_2_7 = r_2[, 7];
  r_2_8 = r_2[, 8];
  r_2_9 = r_2[, 9];
  // compute actual group-level effects
  r_3 = scale_r_cor(z_3, sd_3, L_3);
  r_3_1 = r_3[, 1];
  r_3_2 = r_3[, 2];
  r_3_3 = r_3[, 3];
  r_3_4 = r_3[, 4];
  r_3_5 = r_3[, 5];
  r_3_6 = r_3[, 6];
  r_3_7 = r_3[, 7];
  r_3_8 = r_3[, 8];
  r_3_9 = r_3[, 9];
  r_4_1 = (sd_4[1] * (z_4[1]));
  r_5_sigma_1 = (sd_5[1] * (z_5[1]));
  r_6_nu_1 = (sd_6[1] * (z_6[1]));
}
model {
  // likelihood not including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = X * b;
    // initialize linear predictor term
    vector[N] sigma = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nu = rep_vector(0.0, N);
    //for (n in 1:N) {
      // add more terms to the linear predictor
      mu += r_1_1[J_1] .* Z_1_1 + r_1_2[J_1] .* Z_1_2 + r_1_3[J_1] .* Z_1_3 + r_1_4[J_1] .* Z_1_4 + r_1_5[J_1] .* Z_1_5 
      + r_1_6[J_1] .* Z_1_6 + r_1_7[J_1] .* Z_1_7 + r_1_8[J_1] .* Z_1_8 + r_1_9[J_1] .* Z_1_9 
      + r_2_1[J_2] .* Z_2_1 + r_2_2[J_2] .* Z_2_2 + r_2_3[J_2] .* Z_2_3 + r_2_4[J_2] .* Z_2_4 
      + r_2_5[J_2] .* Z_2_5 + r_2_6[J_2] .* Z_2_6 + r_2_7[J_2] .* Z_2_7 + r_2_8[J_2] .* Z_2_8 
      + r_2_9[J_2] .* Z_2_9 + r_3_1[J_3] .* Z_3_1 + r_3_2[J_3] .* Z_3_2 + r_3_3[J_3] .* Z_3_3 
      + r_3_4[J_3] .* Z_3_4 + r_3_5[J_3] .* Z_3_5 + r_3_6[J_3] .* Z_3_6 + r_3_7[J_3] .* Z_3_7 
      + r_3_8[J_3] .* Z_3_8 + r_3_9[J_3] .* Z_3_9 + r_4_1[J_4] .* Z_4_1;
    //}
    //for (n in 1:N) {
      // add more terms to the linear predictor
      sigma += r_5_sigma_1[J_5] .* Z_5_sigma_1;
    //}
    //for (n in 1:N) {
      // add more terms to the linear predictor
      nu += r_6_nu_1[J_6] .* Z_6_nu_1;
    //}
    //for (n in 1:N) {
      // apply the inverse link function
      sigma = exp(sigma);
   // }
     for (n in 1:N) {
      // apply the inverse link function
      nu[n] = expp1(nu[n]);
     }
    target += -norm18;
    target += -norm10;
    target += -norm0_1;
    target += -norm1_5;
    target += student_t_lpdf(Y | nu, mu, sigma);
  }
  // priors not including constants
  b[1] ~ normal(5,5);
  b[2] ~ normal(0,3);
  b[3] ~ normal(0,3);
  b[4] ~ normal(0,3);
  b[5] ~ normal(0,3);
  b[6] ~ normal(0,3);
  b[7] ~ normal(0,3);
  b[8] ~ normal(0,3);
  b[9] ~ normal(0,3);
  b[10] ~ normal(0,3);
  b[11] ~ normal(0,3);
  b[12] ~ normal(0,3);
  b[13] ~ normal(0,3);
  b[14] ~ normal(0,3);
  b[15] ~ normal(0,3);
  b[16] ~ normal(0,3);
  b[17] ~ normal(0,3);
  b[18] ~ normal(0,3);
  b[19] ~ normal(0,3);
  b[20] ~ normal(0,3);
  b[21] ~ normal(0,3);
  b[22] ~ normal(0,3);
  b[23] ~ normal(0,3);
  b[24] ~ normal(0,3);
  b[25] ~ normal(0,3);
  b[26] ~ normal(0,3);
  b[27] ~ normal(0,3);
  b[28] ~ normal(0,3);
  b[29] ~ normal(0,3);
  b[30] ~ normal(0,3);
  b[31] ~ normal(0,3);
  b[32] ~ normal(0,3);
  b[33] ~ normal(0,3);
  b[34] ~ normal(0,3);
  b[35] ~ normal(0,3);
  b[36] ~ normal(0,3);
  b[37] ~ normal(0,3);
  b[38] ~ normal(0,3);
  //target += normal_lpdf(sd_1 | 1,1)
  //- 9 * normal_lccdf(0 | 1,1);
  sd_1 ~ normal(1,1);
  z_1[1] ~ std_normal();
  z_1[2] ~ std_normal();
  z_1[3] ~ std_normal();
  z_1[4] ~ std_normal();
  z_1[5] ~ std_normal();
  z_1[6] ~ std_normal();
  z_1[7] ~ std_normal();
  z_1[8] ~ std_normal();
  z_1[9] ~ std_normal();
  L_1 ~ lkj_corr_cholesky(1);
  sd_2 ~ normal(1,1);
  z_2[1] ~ std_normal();
  z_2[2] ~ std_normal();
  z_2[3] ~ std_normal();
  z_2[4] ~ std_normal();
  z_2[5] ~ std_normal();
  z_2[6] ~ std_normal();
  z_2[7] ~ std_normal();
  z_2[8] ~ std_normal();
  z_2[9] ~ std_normal();
  L_2 ~ lkj_corr_cholesky(1);
  sd_3 ~ normal(0,0.5);
  z_3[1] ~ std_normal();
  z_3[2] ~ std_normal();
  z_3[3] ~ std_normal();
  z_3[4] ~ std_normal();
  z_3[5] ~ std_normal();
  z_3[6] ~ std_normal();
  z_3[7] ~ std_normal();
  z_3[8] ~ std_normal();
  z_3[9] ~ std_normal();
  L_3 ~ lkj_corr_cholesky(1);
  sd_4[1] ~ normal(0,0.5);
  z_4[1] ~ std_normal();
  sd_5[1] ~ normal(0,0.1);
  z_5[1] ~ std_normal();
  sd_5[1] ~ normal(0,1.5);
  z_6[1] ~ std_normal();
}
generated quantities {
  // compute group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // compute group-level correlations
  corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
  vector<lower=-1,upper=1>[NC_2] cor_2;
  // compute group-level correlations
  corr_matrix[M_3] Cor_3 = multiply_lower_tri_self_transpose(L_3);
  vector<lower=-1,upper=1>[NC_3] cor_3;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_2) {
    for (j in 1:(k - 1)) {
      cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_3) {
    for (j in 1:(k - 1)) {
      cor_3[choose(k - 1, 2) + j] = Cor_3[j, k];
    }
  }
}

The model compiles okay, but I get several warnings/error about student t parameters when I sample:

CmdStan_Fit <- CmdStan_Mod$sample(
                          data = CmdStan_Data,      # Named list of data
                          iter_warmup = 2000,            # Total number of iterations per chain
                          iter_sampling = 2000,          # Number of warmup iterations per chain
                          init = "0",               # Set intial values to 0
                          chains = 4,            
                          threads_per_chain = 2,
                          parallel_chains = ncores/2,         
                          max_treedepth=14, 
                          adapt_delta=0.9)

Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 3 Exception: student_t_lpdf: Degrees of freedom parameter[109] is inf, but must be positive finite! (in '/tmp/Rtmp8hhSxn/model-6eeed5e958d55.stan', line 241, column 4 to column 48)
Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: student_t_lpdf: Location parameter[1] is -nan, but must be finite! (in '/tmp/Rtmp8hhSxn/model-6eeed5e958d55.stan', line 241, column 4 to column 48)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: student_t_lpdf: Scale parameter[1] is inf, but must be positive finite! (in '/tmp/Rtmp8hhSxn/model-6eeed5e958d55.stan', line 241, column 4 to column 48)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Do you have any suggestions for how I might start the 3 student t parameters more effectively?

andrjohns · February 13, 2021, 2:44pm

If that warning only pops up at the start of warmup, and not consistently throughout sampling, then it’s safe for you to ignore.

Additionally, even though you’ve set threads_per_chain = 2 in your cmdstanr call you haven’t generated Stan code that will take advantage of it. You need to specify in brms that you want to use within-chain parallelisation by adding threads = threading(number_threads) to the call

andrjohns · February 13, 2021, 2:48pm

Also, I realise that I should have clarified earlier that using normalize=FALSE, or the ~ notation, is only appropriate if you’re not planning on computing Bayes factors afterwards, as these need all normalising constants included.

JLC · February 13, 2021, 2:56pm

Thanks again @andrjohns !

I definitely missed the reduce_sum in my call!

I revised the automatic code generation with:

RECD_CmdStan_Model <- make_stancode(RECD_Grp_Cor_Dist_fmla,
                                    RECD_Data2,
                                    family = student(link_nu = "logm1"),
                                    prior = RECD_Grp_Cor_Dist_priors,
                                    backend = "cmdstan",
                                    normalize = FALSE,
                                    threads = threading(2),
                                    parallel_chains = ncores/2)

functions {
 /* compute correlated group-level effects
  * Args: 
  *   z: matrix of unscaled group-level effects
  *   SD: vector of standard deviation parameters
  *   L: cholesky factor correlation matrix
  * Returns: 
  *   matrix of scaled group-level effects
  */ 
  matrix scale_r_cor(matrix z, vector SD, matrix L) {
    // r is stored in another dimension order than z
    return transpose(diag_pre_multiply(SD, L) * z);
  }
  /* compute the logm1 link 
   * Args: 
   *   p: a positive scalar
   * Returns: 
   *   a scalar in (-Inf, Inf)
   */ 
   real logm1(real y) { 
     return log(y - 1);
   }
  /* compute the inverse of the logm1 link 
   * Args: 
   *   y: a scalar in (-Inf, Inf)
   * Returns: 
   *   a positive scalar
   */ 
   real expp1(real y) { 
     return exp(y) + 1;
   }
  /* integer sequence of values
   * Args: 
   *   start: starting integer
   *   end: ending integer
   * Returns: 
   *   an integer sequence from start to end
   */ 
  int[] sequence(int start, int end) { 
    int seq[end - start + 1];
    for (n in 1:num_elements(seq)) {
      seq[n] = n + start - 1;
    }
    return seq; 
  } 
  // compute partial sums of the log-likelihood
  real partial_log_lik_lpmf(int[] seq, int start, int end, vector Y, matrix X, vector b, int[] J_1, vector Z_1_1, vector Z_1_2, vector Z_1_3, vector Z_1_4, vector Z_1_5, vector Z_1_6, vector Z_1_7, vector Z_1_8, vector Z_1_9, vector r_1_1, vector r_1_2, vector r_1_3, vector r_1_4, vector r_1_5, vector r_1_6, vector r_1_7, vector r_1_8, vector r_1_9, int[] J_2, vector Z_2_1, vector Z_2_2, vector Z_2_3, vector Z_2_4, vector Z_2_5, vector Z_2_6, vector Z_2_7, vector Z_2_8, vector Z_2_9, vector r_2_1, vector r_2_2, vector r_2_3, vector r_2_4, vector r_2_5, vector r_2_6, vector r_2_7, vector r_2_8, vector r_2_9, int[] J_3, vector Z_3_1, vector Z_3_2, vector Z_3_3, vector Z_3_4, vector Z_3_5, vector Z_3_6, vector Z_3_7, vector Z_3_8, vector Z_3_9, vector r_3_1, vector r_3_2, vector r_3_3, vector r_3_4, vector r_3_5, vector r_3_6, vector r_3_7, vector r_3_8, vector r_3_9, int[] J_4, vector Z_4_1, vector r_4_1, int[] J_5, vector Z_5_sigma_1, vector r_5_sigma_1, int[] J_6, vector Z_6_nu_1, vector r_6_nu_1) {
    real ptarget = 0;
    int N = end - start + 1;
    // initialize linear predictor term
    vector[N] mu = X[start:end] * b;
    // initialize linear predictor term
    vector[N] sigma = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nu = rep_vector(0.0, N);
    for (n in 1:N) {
      // add more terms to the linear predictor
      int nn = n + start - 1;
      mu[n] += r_1_1[J_1[nn]] * Z_1_1[nn] + r_1_2[J_1[nn]] * Z_1_2[nn] + r_1_3[J_1[nn]] * Z_1_3[nn] + r_1_4[J_1[nn]] * Z_1_4[nn] + r_1_5[J_1[nn]] * Z_1_5[nn] + r_1_6[J_1[nn]] * Z_1_6[nn] + r_1_7[J_1[nn]] * Z_1_7[nn] + r_1_8[J_1[nn]] * Z_1_8[nn] + r_1_9[J_1[nn]] * Z_1_9[nn] + r_2_1[J_2[nn]] * Z_2_1[nn] + r_2_2[J_2[nn]] * Z_2_2[nn] + r_2_3[J_2[nn]] * Z_2_3[nn] + r_2_4[J_2[nn]] * Z_2_4[nn] + r_2_5[J_2[nn]] * Z_2_5[nn] + r_2_6[J_2[nn]] * Z_2_6[nn] + r_2_7[J_2[nn]] * Z_2_7[nn] + r_2_8[J_2[nn]] * Z_2_8[nn] + r_2_9[J_2[nn]] * Z_2_9[nn] + r_3_1[J_3[nn]] * Z_3_1[nn] + r_3_2[J_3[nn]] * Z_3_2[nn] + r_3_3[J_3[nn]] * Z_3_3[nn] + r_3_4[J_3[nn]] * Z_3_4[nn] + r_3_5[J_3[nn]] * Z_3_5[nn] + r_3_6[J_3[nn]] * Z_3_6[nn] + r_3_7[J_3[nn]] * Z_3_7[nn] + r_3_8[J_3[nn]] * Z_3_8[nn] + r_3_9[J_3[nn]] * Z_3_9[nn] + r_4_1[J_4[nn]] * Z_4_1[nn];
    }
    for (n in 1:N) {
      // add more terms to the linear predictor
      int nn = n + start - 1;
      sigma[n] += r_5_sigma_1[J_5[nn]] * Z_5_sigma_1[nn];
    }
    for (n in 1:N) {
      // add more terms to the linear predictor
      int nn = n + start - 1;
      nu[n] += r_6_nu_1[J_6[nn]] * Z_6_nu_1[nn];
    }
    for (n in 1:N) {
      // apply the inverse link function
      sigma[n] = exp(sigma[n]);
    }
    for (n in 1:N) {
      // apply the inverse link function
      nu[n] = expp1(nu[n]);
    }
    ptarget += student_t_lupdf(Y[start:end] | nu, mu, sigma);
    return ptarget;
  }
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int grainsize;  // grainsize for threading
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  vector[N] Z_1_3;
  vector[N] Z_1_4;
  vector[N] Z_1_5;
  vector[N] Z_1_6;
  vector[N] Z_1_7;
  vector[N] Z_1_8;
  vector[N] Z_1_9;
  int<lower=1> NC_1;  // number of group-level correlations
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  vector[N] Z_2_2;
  vector[N] Z_2_3;
  vector[N] Z_2_4;
  vector[N] Z_2_5;
  vector[N] Z_2_6;
  vector[N] Z_2_7;
  vector[N] Z_2_8;
  vector[N] Z_2_9;
  int<lower=1> NC_2;  // number of group-level correlations
  // data for group-level effects of ID 3
  int<lower=1> N_3;  // number of grouping levels
  int<lower=1> M_3;  // number of coefficients per level
  int<lower=1> J_3[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_3_1;
  vector[N] Z_3_2;
  vector[N] Z_3_3;
  vector[N] Z_3_4;
  vector[N] Z_3_5;
  vector[N] Z_3_6;
  vector[N] Z_3_7;
  vector[N] Z_3_8;
  vector[N] Z_3_9;
  int<lower=1> NC_3;  // number of group-level correlations
  // data for group-level effects of ID 4
  int<lower=1> N_4;  // number of grouping levels
  int<lower=1> M_4;  // number of coefficients per level
  int<lower=1> J_4[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_4_1;
  // data for group-level effects of ID 5
  int<lower=1> N_5;  // number of grouping levels
  int<lower=1> M_5;  // number of coefficients per level
  int<lower=1> J_5[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_5_sigma_1;
  // data for group-level effects of ID 6
  int<lower=1> N_6;  // number of grouping levels
  int<lower=1> M_6;  // number of coefficients per level
  int<lower=1> J_6[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_6_nu_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int seq[N] = sequence(1, N);
}
parameters {
  vector[K] b;  // population-level effects
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  matrix[M_2, N_2] z_2;  // standardized group-level effects
  cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
  vector<lower=0>[M_3] sd_3;  // group-level standard deviations
  matrix[M_3, N_3] z_3;  // standardized group-level effects
  cholesky_factor_corr[M_3] L_3;  // cholesky factor of correlation matrix
  vector<lower=0>[M_4] sd_4;  // group-level standard deviations
  vector[N_4] z_4[M_4];  // standardized group-level effects
  vector<lower=0>[M_5] sd_5;  // group-level standard deviations
  vector[N_5] z_5[M_5];  // standardized group-level effects
  vector<lower=0>[M_6] sd_6;  // group-level standard deviations
  vector[N_6] z_6[M_6];  // standardized group-level effects
}
transformed parameters {
  matrix[N_1, M_1] r_1;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_1;
  vector[N_1] r_1_2;
  vector[N_1] r_1_3;
  vector[N_1] r_1_4;
  vector[N_1] r_1_5;
  vector[N_1] r_1_6;
  vector[N_1] r_1_7;
  vector[N_1] r_1_8;
  vector[N_1] r_1_9;
  matrix[N_2, M_2] r_2;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_2] r_2_1;
  vector[N_2] r_2_2;
  vector[N_2] r_2_3;
  vector[N_2] r_2_4;
  vector[N_2] r_2_5;
  vector[N_2] r_2_6;
  vector[N_2] r_2_7;
  vector[N_2] r_2_8;
  vector[N_2] r_2_9;
  matrix[N_3, M_3] r_3;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_3] r_3_1;
  vector[N_3] r_3_2;
  vector[N_3] r_3_3;
  vector[N_3] r_3_4;
  vector[N_3] r_3_5;
  vector[N_3] r_3_6;
  vector[N_3] r_3_7;
  vector[N_3] r_3_8;
  vector[N_3] r_3_9;
  vector[N_4] r_4_1;  // actual group-level effects
  vector[N_5] r_5_sigma_1;  // actual group-level effects
  vector[N_6] r_6_nu_1;  // actual group-level effects
  // compute actual group-level effects
  r_1 = scale_r_cor(z_1, sd_1, L_1);
  r_1_1 = r_1[, 1];
  r_1_2 = r_1[, 2];
  r_1_3 = r_1[, 3];
  r_1_4 = r_1[, 4];
  r_1_5 = r_1[, 5];
  r_1_6 = r_1[, 6];
  r_1_7 = r_1[, 7];
  r_1_8 = r_1[, 8];
  r_1_9 = r_1[, 9];
  // compute actual group-level effects
  r_2 = scale_r_cor(z_2, sd_2, L_2);
  r_2_1 = r_2[, 1];
  r_2_2 = r_2[, 2];
  r_2_3 = r_2[, 3];
  r_2_4 = r_2[, 4];
  r_2_5 = r_2[, 5];
  r_2_6 = r_2[, 6];
  r_2_7 = r_2[, 7];
  r_2_8 = r_2[, 8];
  r_2_9 = r_2[, 9];
  // compute actual group-level effects
  r_3 = scale_r_cor(z_3, sd_3, L_3);
  r_3_1 = r_3[, 1];
  r_3_2 = r_3[, 2];
  r_3_3 = r_3[, 3];
  r_3_4 = r_3[, 4];
  r_3_5 = r_3[, 5];
  r_3_6 = r_3[, 6];
  r_3_7 = r_3[, 7];
  r_3_8 = r_3[, 8];
  r_3_9 = r_3[, 9];
  r_4_1 = (sd_4[1] * (z_4[1]));
  r_5_sigma_1 = (sd_5[1] * (z_5[1]));
  r_6_nu_1 = (sd_6[1] * (z_6[1]));
}
model {
  // likelihood not including constants
  if (!prior_only) {
    target += reduce_sum(partial_log_lik_lpmf, seq, grainsize, Y, X, b, J_1, Z_1_1, Z_1_2, Z_1_3, Z_1_4, Z_1_5, Z_1_6, Z_1_7, Z_1_8, Z_1_9, r_1_1, r_1_2, r_1_3, r_1_4, r_1_5, r_1_6, r_1_7, r_1_8, r_1_9, J_2, Z_2_1, Z_2_2, Z_2_3, Z_2_4, Z_2_5, Z_2_6, Z_2_7, Z_2_8, Z_2_9, r_2_1, r_2_2, r_2_3, r_2_4, r_2_5, r_2_6, r_2_7, r_2_8, r_2_9, J_3, Z_3_1, Z_3_2, Z_3_3, Z_3_4, Z_3_5, Z_3_6, Z_3_7, Z_3_8, Z_3_9, r_3_1, r_3_2, r_3_3, r_3_4, r_3_5, r_3_6, r_3_7, r_3_8, r_3_9, J_4, Z_4_1, r_4_1, J_5, Z_5_sigma_1, r_5_sigma_1, J_6, Z_6_nu_1, r_6_nu_1);
  }
  // priors not including constants
b[1] ~ normal(5,5);
        b[2] ~ normal(0,3);
        b[3] ~ normal(0,3);
        b[4] ~ normal(0,3);
        b[5] ~ normal(0,3);
        b[6] ~ normal(0,3);
        b[7] ~ normal(0,3);
        b[8] ~ normal(0,3);
        b[9] ~ normal(0,3);
        b[10] ~ normal(0,3);
        b[11] ~ normal(0,3);
        b[12] ~ normal(0,3);
        b[13] ~ normal(0,3);
        b[14] ~ normal(0,3);
        b[15] ~ normal(0,3);
        b[16] ~ normal(0,3);
        b[17] ~ normal(0,3);
        b[18] ~ normal(0,3);
        b[19] ~ normal(0,3);
        b[20] ~ normal(0,3);
        b[21] ~ normal(0,3);
        b[22] ~ normal(0,3);
        b[23] ~ normal(0,3);
        b[24] ~ normal(0,3);
        b[25] ~ normal(0,3);
        b[26] ~ normal(0,3);
        b[27] ~ normal(0,3);
        b[28] ~ normal(0,3);
        b[29] ~ normal(0,3);
        b[30] ~ normal(0,3);
        b[31] ~ normal(0,3);
        b[32] ~ normal(0,3);
        b[33] ~ normal(0,3);
        b[34] ~ normal(0,3);
        b[35] ~ normal(0,3);
        b[36] ~ normal(0,3);
        b[37] ~ normal(0,3);
        b[38] ~ normal(0,3);
        //target += normal_lpdf(sd_1 | 1,1)
        //- 9 * normal_lccdf(0 | 1,1);
        sd_1 ~ normal(1,1);
        z_1[1] ~ std_normal();
        z_1[2] ~ std_normal();
        z_1[3] ~ std_normal();
        z_1[4] ~ std_normal();
        z_1[5] ~ std_normal();
        z_1[6] ~ std_normal();
        z_1[7] ~ std_normal();
        z_1[8] ~ std_normal();
        z_1[9] ~ std_normal();
        L_1 ~ lkj_corr_cholesky(1);
        sd_2 ~ normal(1,1);
        z_2[1] ~ std_normal();
        z_2[2] ~ std_normal();
        z_2[3] ~ std_normal();
        z_2[4] ~ std_normal();
        z_2[5] ~ std_normal();
        z_2[6] ~ std_normal();
        z_2[7] ~ std_normal();
        z_2[8] ~ std_normal();
        z_2[9] ~ std_normal();
        L_2 ~ lkj_corr_cholesky(1);
        sd_3 ~ normal(0,0.5);
        z_3[1] ~ std_normal();
        z_3[2] ~ std_normal();
        z_3[3] ~ std_normal();
        z_3[4] ~ std_normal();
        z_3[5] ~ std_normal();
        z_3[6] ~ std_normal();
        z_3[7] ~ std_normal();
        z_3[8] ~ std_normal();
        z_3[9] ~ std_normal();
        L_3 ~ lkj_corr_cholesky(1);
        sd_4[1] ~ normal(0,0.5);
        z_4[1] ~ std_normal();
        sd_5[1] ~ normal(0,0.1);
        z_5[1] ~ std_normal();
        sd_5[1] ~ normal(0,1.5);
        z_6[1] ~ std_normal();
}
generated quantities {
  // compute group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // compute group-level correlations
  corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
  vector<lower=-1,upper=1>[NC_2] cor_2;
  // compute group-level correlations
  corr_matrix[M_3] Cor_3 = multiply_lower_tri_self_transpose(L_3);
  vector<lower=-1,upper=1>[NC_3] cor_3;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_2) {
    for (j in 1:(k - 1)) {
      cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_3) {
    for (j in 1:(k - 1)) {
      cor_3[choose(k - 1, 2) + j] = Cor_3[j, k];
    }
  }
}

Is it possible to combine the declaration and accumulation of the linear predictors when using partial sums (vectorize mu and/or the other distribution parameters)?

andrjohns · February 13, 2021, 3:39pm

Looking at the model, I think you’ll get a better speed-up by first optimising the original model, and then introducing threading. I’ll break my recommendations into a couple sections.

Group-level predictors

Your biggest computational cost at the moment is actually caused by your vectorisation of mu:

      mu += r_1_1[J_1] .* Z_1_1 + ... + r_4_1[J_4] .* Z_4_1;

Because each set of elementwise multiplication is conducted sequentially. That is, r_1_1[J_1] .* Z_1_1 is calculated, then r_1_2[J_1] .* Z_1_2, and so forth. This means that you’re traversing vectors of size N multiple times. While there is increased efficiency in the vectorised operation, you’re introducing additional overhead. You can drastically cut down on this by using matrices, rather than vectors.

So at the moment, the code is taking a matrix of effects (r_1) and copying its contents to 9 vectors (r_1_1-r_1_9) to be individually multiplied with the covariates (Z_1_1-Z_1_9). You can instead pack the covariates into a single matrix (i.e., Z_1), and use rows_dot_product to perform the elementwise multiplication and addition.

So you’d pack them in the transformed data block:

transformed data {
  matrix[N,9] Z_1 = [Z_1_1', Z_1_2', Z_1_3', Z_1_4', Z_1_5',
                     Z_1_6', Z_1_7', Z_1_8', Z_1_9'];
  matrix[N,9] Z_2 = [Z_2_1', Z_2_2', Z_2_3', Z_2_4', Z_2_5',
                     Z_2_6', Z_2_7', Z_2_8', Z_2_9'];
  matrix[N,9] Z_3 = [Z_3_1', Z_3_2', Z_3_3', Z_3_4', Z_3_5',
                     Z_3_6', Z_3_7', Z_3_8', Z_3_9'];
}

And then use rows_dot_product in the model block:

    vector[N] mu = X * b + rows_dot_product(r_1[J_1], Z_1)
                         + rows_dot_product(r_2[J_2], Z_2)
                         + rows_dot_product(r_3[J_3], Z_3)
                         + r_4_1[J_4] .* Z_4_1;

nu and sigma

You can also simplify the construction of nu and sigma to one-liners:

    // initialize linear predictor term
    vector[N] sigma = exp(r_5_sigma_1[J_5] .* Z_5_sigma_1);
    // initialize linear predictor term
    vector[N] nu = exp(r_6_nu_1[J_6] .* Z_6_nu_1) + 1;

Normalising constants

Also, given that you’re dropping normalising constants, you don’t need to add the norm18-norm1_5 quantities (as those are constants).

Vectorised priors

You can vectorise your priors by using to_vector with the z_1-z_3 matrices:

  to_vector(z_1) ~ std_normal();
  to_vector(z_2) ~ std_normal();
  to_vector(z_3) ~ std_normal();

You can also vectorise the priors for b by first constructing an integer sequence from 2 - 38 (i.e., the same as calling 2:38 in R) in the transformed data block:

  int b_index[37] = linspaced_int_array(37, 2, 38);

And using that to apply a vectorised prior to b:

b[b_index] ~ normal(0,3);

reduce_sum

Firstly, you’ll need to change the data type of y from vector[N] to real y[N], as reduce_sum can only slice over arrays:

  real Y[N];  // response variable

Next, you’ll need a partial_sum function that will calculate the likelihood:

  real partial_log_lik_lpdf(data real[] y_slice, int start, int end, vector nu,
                            vector mu, vector sigma) {
    return student_t_lupdf(to_vector(y_slice) | nu[start:end], mu[start:end], sigma[start:end]);
  }

And then call this in the model block:

    target += reduce_sum(partial_log_lik_lpdf, Y, grainsize, nu, mu, sigma);

andrjohns · February 13, 2021, 3:39pm

Putting all those pieces together, your model would then look like (completed untested):

functions {
  /* compute correlated group-level effects
  * Args: 
    *   z: matrix of unscaled group-level effects
  *   SD: vector of standard deviation parameters
  *   L: cholesky factor correlation matrix
  * Returns: 
    *   matrix of scaled group-level effects
  */ 
    matrix scale_r_cor(matrix z, vector SD, matrix L) {
      // r is stored in another dimension order than z
      return transpose(diag_pre_multiply(SD, L) * z);
    }

  real partial_log_lik_lpdf(data real[] y_slice, int start, int end, vector nu,
                            vector mu, vector sigma) {
    return student_t_lupdf(to_vector(y_slice) | nu[start:end], mu[start:end],
                                                sigma[start:end]);
  }
}
data {
  int<lower=1> N;  // total number of observations
  real Y[N];  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  vector[N] Z_1_3;
  vector[N] Z_1_4;
  vector[N] Z_1_5;
  vector[N] Z_1_6;
  vector[N] Z_1_7;
  vector[N] Z_1_8;
  vector[N] Z_1_9;
  int<lower=1> NC_1;  // number of group-level correlations
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  vector[N] Z_2_2;
  vector[N] Z_2_3;
  vector[N] Z_2_4;
  vector[N] Z_2_5;
  vector[N] Z_2_6;
  vector[N] Z_2_7;
  vector[N] Z_2_8;
  vector[N] Z_2_9;
  int<lower=1> NC_2;  // number of group-level correlations
  // data for group-level effects of ID 3
  int<lower=1> N_3;  // number of grouping levels
  int<lower=1> M_3;  // number of coefficients per level
  int<lower=1> J_3[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_3_1;
  vector[N] Z_3_2;
  vector[N] Z_3_3;
  vector[N] Z_3_4;
  vector[N] Z_3_5;
  vector[N] Z_3_6;
  vector[N] Z_3_7;
  vector[N] Z_3_8;
  vector[N] Z_3_9;
  int<lower=1> NC_3;  // number of group-level correlations
  // data for group-level effects of ID 4
  int<lower=1> N_4;  // number of grouping levels
  int<lower=1> M_4;  // number of coefficients per level
  int<lower=1> J_4[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_4_1;
  // data for group-level effects of ID 5
  int<lower=1> N_5;  // number of grouping levels
  int<lower=1> M_5;  // number of coefficients per level
  int<lower=1> J_5[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_5_sigma_1;
  // data for group-level effects of ID 6
  int<lower=1> N_6;  // number of grouping levels
  int<lower=1> M_6;  // number of coefficients per level
  int<lower=1> J_6[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_6_nu_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  matrix[N,9] Z_1 = [Z_1_1', Z_1_2', Z_1_3', Z_1_4', Z_1_5',
                     Z_1_6', Z_1_7', Z_1_8', Z_1_9'];
  matrix[N,9] Z_2 = [Z_2_1', Z_2_2', Z_2_3', Z_2_4', Z_2_5',
                     Z_2_6', Z_2_7', Z_2_8', Z_2_9'];
  matrix[N,9] Z_3 = [Z_3_1', Z_3_2', Z_3_3', Z_3_4', Z_3_5',
                     Z_3_6', Z_3_7', Z_3_8', Z_3_9'];
  int b_index[37] = linspaced_int_array(37, 2, 38);
}
parameters {
  vector[K] b;  // population-level effects
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  matrix[M_2, N_2] z_2;  // standardized group-level effects
  cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
  vector<lower=0>[M_3] sd_3;  // group-level standard deviations
  matrix[M_3, N_3] z_3;  // standardized group-level effects
  cholesky_factor_corr[M_3] L_3;  // cholesky factor of correlation matrix
  vector<lower=0>[M_4] sd_4;  // group-level standard deviations
  vector[N_4] z_4[M_4];  // standardized group-level effects
  vector<lower=0>[M_5] sd_5;  // group-level standard deviations
  vector[N_5] z_5[M_5];  // standardized group-level effects
  vector<lower=0>[M_6] sd_6;  // group-level standard deviations
  vector[N_6] z_6[M_6];  // standardized group-level effects
}
transformed parameters {
  // actual group-level effects
  matrix[N_1, M_1] r_1 = scale_r_cor(z_1, sd_1, L_1);
  matrix[N_2, M_2] r_2 = scale_r_cor(z_2, sd_2, L_2);
  matrix[N_3, M_3] r_3 = scale_r_cor(z_3, sd_3, L_3); 

  vector[N_4] r_4_1 = (sd_4[1] * (z_4[1]));
  vector[N_5] r_5_sigma_1 = (sd_5[1] * (z_5[1]));
  vector[N_6] r_6_nu_1 = (sd_6[1] * (z_6[1]));
}
model {
  int grainsize = 1;
  // likelihood not including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = X * b + rows_dot_product(r_1[J_1], Z_1)
                         + rows_dot_product(r_2[J_2], Z_2)
                         + rows_dot_product(r_3[J_3], Z_3)
                         + r_4_1[J_4] .* Z_4_1;
    // initialize linear predictor term
    vector[N] sigma = exp(r_5_sigma_1[J_5] .* Z_5_sigma_1);
    // initialize linear predictor term
    vector[N] nu = exp(r_6_nu_1[J_6] .* Z_6_nu_1) + 1;

    target += reduce_sum(partial_log_lik_lpdf, Y, grainsize, nu, mu, sigma);
  }

  // priors not including constants
  b[1] ~ normal(5,5);
  b[b_index] ~ normal(0,3);

  sd_1 ~ normal(1,1);
  sd_2 ~ normal(1,1);
  sd_3 ~ normal(0,0.5);
  sd_4[1] ~ normal(0,0.5);
  sd_5[1] ~ normal(0,0.1);
  sd_6[1] ~ normal(0,1.5);

  to_vector(z_1) ~ std_normal();
  to_vector(z_2) ~ std_normal();
  to_vector(z_3) ~ std_normal();
  z_4[1] ~ std_normal();
  z_5[1] ~ std_normal();
  z_6[1] ~ std_normal();

  L_1 ~ lkj_corr_cholesky(1);
  L_2 ~ lkj_corr_cholesky(1);
  L_3 ~ lkj_corr_cholesky(1);
}
generated quantities {
  // compute group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // compute group-level correlations
  corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
  vector<lower=-1,upper=1>[NC_2] cor_2;
  // compute group-level correlations
  corr_matrix[M_3] Cor_3 = multiply_lower_tri_self_transpose(L_3);
  vector<lower=-1,upper=1>[NC_3] cor_3;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_2) {
    for (j in 1:(k - 1)) {
      cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_3) {
    for (j in 1:(k - 1)) {
      cor_3[choose(k - 1, 2) + j] = Cor_3[j, k];
    }
  }
}

andrjohns · February 13, 2021, 3:46pm

Also, if you have a discrete GPU in your system I’d highly recommend setting up OpenCL in cmdstanr, as this will accelerate the rows_dot_product calls, among others.

The cmdstan documentation here covers how to install the appropriate drivers and identify your relevant platform and device ID.

You would then use these when compiling your model in cmdstanr:

acc_options = list(
  stan_threads = TRUE,
  stan_opencl = TRUE,
  opencl_platform_id = your_platform_id, 
  opencl_device_id = your_device_id
)

mod_cl <- cmdstan_model("your_model.stan", cpp_options = acc_options)

JLC · February 13, 2021, 4:08pm

Amazing! Thanks @andrjohns !

I also hadn’t even considered offloading to the GPU.

Everything has compiled fine with OpenCL (for a GeForce RTX 2060). When running on a GPU should the number of threads per chain be based on the number of available GPU cores (1920 CUDA cores in my case)?

andrjohns · February 13, 2021, 4:16pm

No, threads_per_chain refers to CPU threads utilised by reduce_sum

JLC · February 13, 2021, 7:42pm

Thanks again @andrjohns

The model compiles without error, but sampling does not begin with the following errors for each chain:

Chain 1 file3f9673221bc6a_threads: 
stan/lib/stan_math/lib/eigen_3.3.9/Eigen/src/Core/CwiseBinaryOp.h:110: 
Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::CwiseBinaryOp(const Lhs&, const Rhs&, const BinaryOp&) 
[with BinaryOp = Eigen::internal::scalar_product_op<double, double>; LhsType = const 
Eigen::Matrix<double, -1, -1>; RhsType = const Eigen::Matrix<double, -1, -1>; 
Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::Lhs = Eigen::Matrix<double, -1, -1>; 
Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::Rhs = Eigen::Matrix<double, -1, -1>]: 
Assertion `aLhs.rows() == aRhs.rows() && aLhs.cols() == aRhs.cols()' failed.

Here is a subset of the data for the model:
Data_dput.txt (2.2 MB)

Reprex:

# Load data
CmdStan_Data <- dget(file = "Data_dput.txt")

# Define model
CmdStan_Model <- 
        "functions {
  /* compute correlated group-level effects
  * Args: 
    *   z: matrix of unscaled group-level effects
  *   SD: vector of standard deviation parameters
  *   L: cholesky factor correlation matrix
  * Returns: 
    *   matrix of scaled group-level effects
  */ 
    matrix scale_r_cor(matrix z, vector SD, matrix L) {
      // r is stored in another dimension order than z
      return transpose(diag_pre_multiply(SD, L) * z);
    }

  real partial_log_lik_lpdf(data real[] y_slice, int start, int end, vector nu,
                            vector mu, vector sigma) {
    return student_t_lupdf(to_vector(y_slice) | nu[start:end], mu[start:end],
                                                sigma[start:end]);
  }
}
data {
  int<lower=1> N;  // total number of observations
  real Y[N];  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  vector[N] Z_1_3;
  vector[N] Z_1_4;
  vector[N] Z_1_5;
  vector[N] Z_1_6;
  vector[N] Z_1_7;
  vector[N] Z_1_8;
  vector[N] Z_1_9;
  int<lower=1> NC_1;  // number of group-level correlations
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  vector[N] Z_2_2;
  vector[N] Z_2_3;
  vector[N] Z_2_4;
  vector[N] Z_2_5;
  vector[N] Z_2_6;
  vector[N] Z_2_7;
  vector[N] Z_2_8;
  vector[N] Z_2_9;
  int<lower=1> NC_2;  // number of group-level correlations
  // data for group-level effects of ID 3
  int<lower=1> N_3;  // number of grouping levels
  int<lower=1> M_3;  // number of coefficients per level
  int<lower=1> J_3[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_3_1;
  vector[N] Z_3_2;
  vector[N] Z_3_3;
  vector[N] Z_3_4;
  vector[N] Z_3_5;
  vector[N] Z_3_6;
  vector[N] Z_3_7;
  vector[N] Z_3_8;
  vector[N] Z_3_9;
  int<lower=1> NC_3;  // number of group-level correlations
  // data for group-level effects of ID 4
  int<lower=1> N_4;  // number of grouping levels
  int<lower=1> M_4;  // number of coefficients per level
  int<lower=1> J_4[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_4_1;
  // data for group-level effects of ID 5
  int<lower=1> N_5;  // number of grouping levels
  int<lower=1> M_5;  // number of coefficients per level
  int<lower=1> J_5[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_5_sigma_1;
  // data for group-level effects of ID 6
  int<lower=1> N_6;  // number of grouping levels
  int<lower=1> M_6;  // number of coefficients per level
  int<lower=1> J_6[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_6_nu_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  matrix[N,9] Z_1 = [Z_1_1', Z_1_2', Z_1_3', Z_1_4', Z_1_5',
                     Z_1_6', Z_1_7', Z_1_8', Z_1_9'];
  matrix[N,9] Z_2 = [Z_2_1', Z_2_2', Z_2_3', Z_2_4', Z_2_5',
                     Z_2_6', Z_2_7', Z_2_8', Z_2_9'];
  matrix[N,9] Z_3 = [Z_3_1', Z_3_2', Z_3_3', Z_3_4', Z_3_5',
                     Z_3_6', Z_3_7', Z_3_8', Z_3_9'];
  int b_index[37] = linspaced_int_array(37, 2, 38);
}
parameters {
  vector[K] b;  // population-level effects
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  matrix[M_2, N_2] z_2;  // standardized group-level effects
  cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
  vector<lower=0>[M_3] sd_3;  // group-level standard deviations
  matrix[M_3, N_3] z_3;  // standardized group-level effects
  cholesky_factor_corr[M_3] L_3;  // cholesky factor of correlation matrix
  vector<lower=0>[M_4] sd_4;  // group-level standard deviations
  vector[N_4] z_4[M_4];  // standardized group-level effects
  vector<lower=0>[M_5] sd_5;  // group-level standard deviations
  vector[N_5] z_5[M_5];  // standardized group-level effects
  vector<lower=0>[M_6] sd_6;  // group-level standard deviations
  vector[N_6] z_6[M_6];  // standardized group-level effects
}
transformed parameters {
  // actual group-level effects
  matrix[N_1, M_1] r_1 = scale_r_cor(z_1, sd_1, L_1);
  matrix[N_2, M_2] r_2 = scale_r_cor(z_2, sd_2, L_2);
  matrix[N_3, M_3] r_3 = scale_r_cor(z_3, sd_3, L_3); 

  vector[N_4] r_4_1 = (sd_4[1] * (z_4[1]));
  vector[N_5] r_5_sigma_1 = (sd_5[1] * (z_5[1]));
  vector[N_6] r_6_nu_1 = (sd_6[1] * (z_6[1]));
}
model {
  int grainsize = 1;
  // likelihood not including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = X * b + rows_dot_product(r_1[J_1], Z_1)
                         + rows_dot_product(r_2[J_2], Z_2)
                         + rows_dot_product(r_3[J_3], Z_3)
                         + r_4_1[J_4] .* Z_4_1;
    // initialize linear predictor term
    vector[N] sigma = exp(r_5_sigma_1[J_5] .* Z_5_sigma_1);
    // initialize linear predictor term
    vector[N] nu = exp(r_6_nu_1[J_6] .* Z_6_nu_1) + 1;

    target += reduce_sum(partial_log_lik_lpdf, Y, grainsize, nu, mu, sigma);
  }
  // priors not including constants
  b[1] ~ normal(5,5);
  b[b_index] ~ normal(0,3);
  sd_1 ~ normal(1,1);
  sd_2 ~ normal(1,1);
  sd_3 ~ normal(0,0.5);
  sd_4[1] ~ normal(0,0.5);
  sd_5[1] ~ normal(0,0.1);
  sd_6[1] ~ normal(0,1.5);
  to_vector(z_1) ~ std_normal();
  to_vector(z_2) ~ std_normal();
  to_vector(z_3) ~ std_normal();
  z_4[1] ~ std_normal();
  z_5[1] ~ std_normal();
  z_6[1] ~ std_normal();
  L_1 ~ lkj_corr_cholesky(1);
  L_2 ~ lkj_corr_cholesky(1);
  L_3 ~ lkj_corr_cholesky(1);
}
generated quantities {
  // compute group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // compute group-level correlations
  corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
  vector<lower=-1,upper=1>[NC_2] cor_2;
  // compute group-level correlations
  corr_matrix[M_3] Cor_3 = multiply_lower_tri_self_transpose(L_3);
  vector<lower=-1,upper=1>[NC_3] cor_3;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_2) {
    for (j in 1:(k - 1)) {
      cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
    }
  }
  // extract upper diagonal of correlation matrix
  for (k in 1:M_3) {
    for (j in 1:(k - 1)) {
      cor_3[choose(k - 1, 2) + j] = Cor_3[j, k];
    }
  }
}
"

Compile model

 # Set compile options
acc_options = list(
  stan_threads = TRUE #,
  #stan_opencl = TRUE,
  #opencl_platform_id = 0, 
  #opencl_device_id = 0
)

# Compile model
CmdStan_Mod <- cmdstan_model(write_stan_file(CmdStan_Model), 
                                  compile = TRUE,
                                  cpp_options = acc_options)

Sample

CmdStan_Fit <- CmdStan_Mod$sample(
        data = CmdStan_Data, 
        iter_warmup = 500,            
        iter_sampling = 1000,          
        #init = initf,               
        chains = 2,             
        threads_per_chain = 2,
        parallel_chains = 2,         
        max_treedepth=12, 
        adapt_delta=0.9)

andrjohns · February 14, 2021, 3:17am

Ah there was a typo in the transformed data block that I gave you, there should a transposition at the end of the Z_ matrix declarations:

  matrix[N,9] Z_1 = [Z_1_1', Z_1_2', Z_1_3', Z_1_4', Z_1_5',
                     Z_1_6', Z_1_7', Z_1_8', Z_1_9']';
  matrix[N,9] Z_2 = [Z_2_1', Z_2_2', Z_2_3', Z_2_4', Z_2_5',
                     Z_2_6', Z_2_7', Z_2_8', Z_2_9']';
  matrix[N,9] Z_3 = [Z_3_1', Z_3_2', Z_3_3', Z_3_4', Z_3_5',
                     Z_3_6', Z_3_7', Z_3_8', Z_3_9']';

But this should have thrown a different earlier, so it looks like a bug here. Thanks for finding this!

andrjohns · February 14, 2021, 4:50am

I’ve also just realised that the student_t distribution is OpenCL-accelerated as of 2.26, so it could be a good idea to try that directly without using reduce_sum

JLC · February 15, 2021, 8:08pm

Thanks, @andrjohns

Unfortunately, without reduce_sum running the model crashes the whole system.

Topic		Replies	Views
Stan efficient! Modeling	7	1030	February 17, 2022
Improving Hierarchical Model Efficiency Modeling rstan , fitting-issues , specification	7	534	February 14, 2023
[Case-study preview] Speeding up Stan by reducing redundant computation Publicity performance	8	2119	June 6, 2020
How to speed up my stan sampling Modeling	1	123	July 3, 2024
How to speed up model? Modeling rstan , techniques , performance	7	809	September 9, 2020