Speeding up a Student model with 3 correlation matrices

I have a relatively large data set (~9000 observations) and am fitting a hierarchical Student model with several correlations. The model initially took a little over a week for 4 chains of 6500 iterations. I’m attempting to speed things up with some efficiency if possible. I used brms to generate some initial stan code and have done some vectorization.

brms syntax:

``````Mod_fmla <- bf(RECD ~ 0 + Intercept +
AvRECD.c +
Age_m.c +
Pressure.c +
Imped.c +
Volume.c +
factor(Frequency) +
factor(Frequency):Pressure.c +
factor(Frequency):Imped.c +
factor(Frequency):Volume.c +
(1 + factor(Frequency)|ME/Ear/SubjID) +
(1|test_year),
center = TRUE) +
lf(sigma ~ 0 + (1|ME), cmc = FALSE) +
lf(nu ~ 0 + (1|ME), cmc = FALSE)

``````

Stan model:

``````"functions {
/* compute correlated group-level effects
* Args:
*   z: matrix of unscaled group-level effects
*   SD: vector of standard deviation parameters
*   L: cholesky factor correlation matrix
* Returns:
*   matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
* Args:
*   p: a positive scalar
* Returns:
*   a scalar in (-Inf, Inf)
*/
real logm1(real y) {
return log(y - 1);
}
/* compute the inverse of the logm1 link
* Args:
*   y: a scalar in (-Inf, Inf)
* Returns:
*   a positive scalar
*/
real expp1(real y) {
return exp(y) + 1;
}
}
``````
``````data {
int<lower=1> N;  // total number of observations
vector[N] Y;  // response variable
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1;  // number of grouping levels
int<lower=1> M_1;  // number of coefficients per level
int<lower=1> J_1[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
vector[N] Z_1_2;
vector[N] Z_1_3;
vector[N] Z_1_4;
vector[N] Z_1_5;
vector[N] Z_1_6;
vector[N] Z_1_7;
vector[N] Z_1_8;
vector[N] Z_1_9;
int<lower=1> NC_1;  // number of group-level correlations
// data for group-level effects of ID 2
int<lower=1> N_2;  // number of grouping levels
int<lower=1> M_2;  // number of coefficients per level
int<lower=1> J_2[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_2_1;
vector[N] Z_2_2;
vector[N] Z_2_3;
vector[N] Z_2_4;
vector[N] Z_2_5;
vector[N] Z_2_6;
vector[N] Z_2_7;
vector[N] Z_2_8;
vector[N] Z_2_9;
int<lower=1> NC_2;  // number of group-level correlations
// data for group-level effects of ID 3
int<lower=1> N_3;  // number of grouping levels
int<lower=1> M_3;  // number of coefficients per level
int<lower=1> J_3[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_3_1;
vector[N] Z_3_2;
vector[N] Z_3_3;
vector[N] Z_3_4;
vector[N] Z_3_5;
vector[N] Z_3_6;
vector[N] Z_3_7;
vector[N] Z_3_8;
vector[N] Z_3_9;
int<lower=1> NC_3;  // number of group-level correlations
// data for group-level effects of ID 4
int<lower=1> N_4;  // number of grouping levels
int<lower=1> M_4;  // number of coefficients per level
int<lower=1> J_4[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_4_1;
// data for group-level effects of ID 5
int<lower=1> N_5;  // number of grouping levels
int<lower=1> M_5;  // number of coefficients per level
int<lower=1> J_5[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_5_sigma_1;
// data for group-level effects of ID 6
int<lower=1> N_6;  // number of grouping levels
int<lower=1> M_6;  // number of coefficients per level
int<lower=1> J_6[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_6_nu_1;
int prior_only;  // should the likelihood be ignored?
}
``````
``````transformed data {
real norm18 = 18 * normal_lccdf(0 | 1,1);
real norm10 = 10 * normal_lccdf(0 | 0,0.5);
real norm0_1 = 1 * normal_lccdf(0 | 0,0.1);
real norm1_5 = 1 * normal_lccdf(0 | 0,1.5);
}
``````
``````parameters {
vector[K] b;  // population-level effects
vector<lower=0>[M_1] sd_1;  // group-level standard deviations
matrix[M_1, N_1] z_1;  // standardized group-level effects
cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
vector<lower=0>[M_2] sd_2;  // group-level standard deviations
matrix[M_2, N_2] z_2;  // standardized group-level effects
cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
vector<lower=0>[M_3] sd_3;  // group-level standard deviations
matrix[M_3, N_3] z_3;  // standardized group-level effects
cholesky_factor_corr[M_3] L_3;  // cholesky factor of correlation matrix
vector<lower=0>[M_4] sd_4;  // group-level standard deviations
vector[N_4] z_4[M_4];  // standardized group-level effects
vector<lower=0>[M_5] sd_5;  // group-level standard deviations
vector[N_5] z_5[M_5];  // standardized group-level effects
vector<lower=0>[M_6] sd_6;  // group-level standard deviations
vector[N_6] z_6[M_6];  // standardized group-level effects
}
``````
``````transformed parameters {
matrix[N_1, M_1] r_1;  // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_1] r_1_1;
vector[N_1] r_1_2;
vector[N_1] r_1_3;
vector[N_1] r_1_4;
vector[N_1] r_1_5;
vector[N_1] r_1_6;
vector[N_1] r_1_7;
vector[N_1] r_1_8;
vector[N_1] r_1_9;
matrix[N_2, M_2] r_2;  // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_2] r_2_1;
vector[N_2] r_2_2;
vector[N_2] r_2_3;
vector[N_2] r_2_4;
vector[N_2] r_2_5;
vector[N_2] r_2_6;
vector[N_2] r_2_7;
vector[N_2] r_2_8;
vector[N_2] r_2_9;
matrix[N_3, M_3] r_3;  // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_3] r_3_1;
vector[N_3] r_3_2;
vector[N_3] r_3_3;
vector[N_3] r_3_4;
vector[N_3] r_3_5;
vector[N_3] r_3_6;
vector[N_3] r_3_7;
vector[N_3] r_3_8;
vector[N_3] r_3_9;
vector[N_4] r_4_1;  // actual group-level effects
vector[N_5] r_5_sigma_1;  // actual group-level effects
vector[N_6] r_6_nu_1;  // actual group-level effects
// compute actual group-level effects
r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_1 = r_1[, 1];
r_1_2 = r_1[, 2];
r_1_3 = r_1[, 3];
r_1_4 = r_1[, 4];
r_1_5 = r_1[, 5];
r_1_6 = r_1[, 6];
r_1_7 = r_1[, 7];
r_1_8 = r_1[, 8];
r_1_9 = r_1[, 9];
// compute actual group-level effects
r_2 = scale_r_cor(z_2, sd_2, L_2);
r_2_1 = r_2[, 1];
r_2_2 = r_2[, 2];
r_2_3 = r_2[, 3];
r_2_4 = r_2[, 4];
r_2_5 = r_2[, 5];
r_2_6 = r_2[, 6];
r_2_7 = r_2[, 7];
r_2_8 = r_2[, 8];
r_2_9 = r_2[, 9];
// compute actual group-level effects
r_3 = scale_r_cor(z_3, sd_3, L_3);
r_3_1 = r_3[, 1];
r_3_2 = r_3[, 2];
r_3_3 = r_3[, 3];
r_3_4 = r_3[, 4];
r_3_5 = r_3[, 5];
r_3_6 = r_3[, 6];
r_3_7 = r_3[, 7];
r_3_8 = r_3[, 8];
r_3_9 = r_3[, 9];
r_4_1 = (sd_4[1] * (z_4[1]));
r_5_sigma_1 = (sd_5[1] * (z_5[1]));
r_6_nu_1 = (sd_6[1] * (z_6[1]));
}
``````
``````model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = X * b;
// initialize linear predictor term
vector[N] sigma = rep_vector(0.0, N);
// initialize linear predictor term
vector[N] nu = rep_vector(0.0, N);
//for (n in 1:N) {
// add more terms to the linear predictor
mu += r_1_1[J_1] .* Z_1_1 + r_1_2[J_1] .* Z_1_2 + r_1_3[J_1] .* Z_1_3 + r_1_4[J_1] .* Z_1_4 + r_1_5[J_1] .* Z_1_5
+ r_1_6[J_1] .* Z_1_6 + r_1_7[J_1] .* Z_1_7 + r_1_8[J_1] .* Z_1_8 + r_1_9[J_1] .* Z_1_9
+ r_2_1[J_2] .* Z_2_1 + r_2_2[J_2] .* Z_2_2 + r_2_3[J_2] .* Z_2_3 + r_2_4[J_2] .* Z_2_4
+ r_2_5[J_2] .* Z_2_5 + r_2_6[J_2] .* Z_2_6 + r_2_7[J_2] .* Z_2_7 + r_2_8[J_2] .* Z_2_8
+ r_2_9[J_2] .* Z_2_9 + r_3_1[J_3] .* Z_3_1 + r_3_2[J_3] .* Z_3_2 + r_3_3[J_3] .* Z_3_3
+ r_3_4[J_3] .* Z_3_4 + r_3_5[J_3] .* Z_3_5 + r_3_6[J_3] .* Z_3_6 + r_3_7[J_3] .* Z_3_7
+ r_3_8[J_3] .* Z_3_8 + r_3_9[J_3] .* Z_3_9 + r_4_1[J_4] .* Z_4_1;
// }
//for (n in 1:N) {
// add more terms to the linear predictor
sigma += r_5_sigma_1[J_5] .* Z_5_sigma_1;
//}
//for (n in 1:N) {
// add more terms to the linear predictor
nu += r_6_nu_1[J_6] .* Z_6_nu_1;
//}
//for (n in 1:N) {
// apply the inverse link function
sigma = exp(sigma);
// }
for (n in 1:N) {
// apply the inverse link function
nu[n] = expp1(nu[n]);
}
target += -norm18;
target += -norm10;
target += -norm0_1;
target += -norm1_5;
target += student_t_lpdf(Y | nu, mu, sigma);
}
``````

Can nu be vectorized here or can I create a vector when calling the distribution?

``````        // priors including constants
b[1] ~ normal(5,5);
b[2] ~ normal(0,3);
b[3] ~ normal(0,3);
b[4] ~ normal(0,3);
b[5] ~ normal(0,3);
b[6] ~ normal(0,3);
b[7] ~ normal(0,3);
b[8] ~ normal(0,3);
b[9] ~ normal(0,3);
b[10] ~ normal(0,3);
b[11] ~ normal(0,3);
b[12] ~ normal(0,3);
b[13] ~ normal(0,3);
b[14] ~ normal(0,3);
b[15] ~ normal(0,3);
b[16] ~ normal(0,3);
b[17] ~ normal(0,3);
b[18] ~ normal(0,3);
b[19] ~ normal(0,3);
b[20] ~ normal(0,3);
b[21] ~ normal(0,3);
b[22] ~ normal(0,3);
b[23] ~ normal(0,3);
b[24] ~ normal(0,3);
b[25] ~ normal(0,3);
b[26] ~ normal(0,3);
b[27] ~ normal(0,3);
b[28] ~ normal(0,3);
b[29] ~ normal(0,3);
b[30] ~ normal(0,3);
b[31] ~ normal(0,3);
b[32] ~ normal(0,3);
b[33] ~ normal(0,3);
b[34] ~ normal(0,3);
b[35] ~ normal(0,3);
b[36] ~ normal(0,3);
b[37] ~ normal(0,3);
b[38] ~ normal(0,3);
//target += normal_lpdf(sd_1 | 1,1)
//- 9 * normal_lccdf(0 | 1,1);
sd_1 ~ normal(1,1);
z_1[1] ~ std_normal();
z_1[2] ~ std_normal();
z_1[3] ~ std_normal();
z_1[4] ~ std_normal();
z_1[5] ~ std_normal();
z_1[6] ~ std_normal();
z_1[7] ~ std_normal();
z_1[8] ~ std_normal();
z_1[9] ~ std_normal();
L_1 ~ lkj_corr_cholesky(1);
sd_2 ~ normal(1,1);
z_2[1] ~ std_normal();
z_2[2] ~ std_normal();
z_2[3] ~ std_normal();
z_2[4] ~ std_normal();
z_2[5] ~ std_normal();
z_2[6] ~ std_normal();
z_2[7] ~ std_normal();
z_2[8] ~ std_normal();
z_2[9] ~ std_normal();
L_2 ~ lkj_corr_cholesky(1);
sd_3 ~ normal(0,0.5);
z_3[1] ~ std_normal();
z_3[2] ~ std_normal();
z_3[3] ~ std_normal();
z_3[4] ~ std_normal();
z_3[5] ~ std_normal();
z_3[6] ~ std_normal();
z_3[7] ~ std_normal();
z_3[8] ~ std_normal();
z_3[9] ~ std_normal();
L_3 ~ lkj_corr_cholesky(1);
sd_4[1] ~ normal(0,0.5);
z_4[1] ~ std_normal();
sd_5[1] ~ normal(0,0.1);
z_5[1] ~ std_normal();
sd_5[1] ~ normal(0,1.5);
z_6[1] ~ std_normal();
}
``````

I think the ~ syntax for priors helps.

``````generated quantities {
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// compute group-level correlations
corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
vector<lower=-1,upper=1>[NC_2] cor_2;
// compute group-level correlations
corr_matrix[M_3] Cor_3 = multiply_lower_tri_self_transpose(L_3);
vector<lower=-1,upper=1>[NC_3] cor_3;
// extract upper diagonal of correlation matrix
for (k in 1:M_1) {
for (j in 1:(k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
// extract upper diagonal of correlation matrix
for (k in 1:M_2) {
for (j in 1:(k - 1)) {
cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
}
}
// extract upper diagonal of correlation matrix
for (k in 1:M_3) {
for (j in 1:(k - 1)) {
cor_3[choose(k - 1, 2) + j] = Cor_3[j, k];
}
}
}
``````

Are there any additional places some vectorization might help? Can the correlation matrices in the generated quantities be vectorized?

I’d actually go back to the `brms` models first. If you have cmdstanR installed, and are using the github version of `brms`, then you can run your `brms` models with both multi-threading (`reduce_sum`) and dropping normalising constants (`~` syntax). Note that if you only have four CPU cores and are running four chains, then multithreading will likely provide little benefit.

And to drop normalising constants, you just need to specify `normalize=FALSE` in the `brm` call

1 Like

Thank you @andrjohns !

I had not yet experimented with cmdstan, but it definitely sped things up by days!

I am trying to take it one step further by vectorizing the cmdstan model:

``````functions {
/* compute correlated group-level effects
* Args:
*   z: matrix of unscaled group-level effects
*   SD: vector of standard deviation parameters
*   L: cholesky factor correlation matrix
* Returns:
*   matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
* Args:
*   p: a positive scalar
* Returns:
*   a scalar in (-Inf, Inf)
*/
real logm1(real y) {
return log(y - 1);
}
/* compute the inverse of the logm1 link
* Args:
*   y: a scalar in (-Inf, Inf)
* Returns:
*   a positive scalar
*/
real expp1(real y) {
return exp(y) + 1;
}
}
data {
int<lower=1> N;  // total number of observations
vector[N] Y;  // response variable
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1;  // number of grouping levels
int<lower=1> M_1;  // number of coefficients per level
int<lower=1> J_1[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
vector[N] Z_1_2;
vector[N] Z_1_3;
vector[N] Z_1_4;
vector[N] Z_1_5;
vector[N] Z_1_6;
vector[N] Z_1_7;
vector[N] Z_1_8;
vector[N] Z_1_9;
int<lower=1> NC_1;  // number of group-level correlations
// data for group-level effects of ID 2
int<lower=1> N_2;  // number of grouping levels
int<lower=1> M_2;  // number of coefficients per level
int<lower=1> J_2[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_2_1;
vector[N] Z_2_2;
vector[N] Z_2_3;
vector[N] Z_2_4;
vector[N] Z_2_5;
vector[N] Z_2_6;
vector[N] Z_2_7;
vector[N] Z_2_8;
vector[N] Z_2_9;
int<lower=1> NC_2;  // number of group-level correlations
// data for group-level effects of ID 3
int<lower=1> N_3;  // number of grouping levels
int<lower=1> M_3;  // number of coefficients per level
int<lower=1> J_3[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_3_1;
vector[N] Z_3_2;
vector[N] Z_3_3;
vector[N] Z_3_4;
vector[N] Z_3_5;
vector[N] Z_3_6;
vector[N] Z_3_7;
vector[N] Z_3_8;
vector[N] Z_3_9;
int<lower=1> NC_3;  // number of group-level correlations
// data for group-level effects of ID 4
int<lower=1> N_4;  // number of grouping levels
int<lower=1> M_4;  // number of coefficients per level
int<lower=1> J_4[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_4_1;
// data for group-level effects of ID 5
int<lower=1> N_5;  // number of grouping levels
int<lower=1> M_5;  // number of coefficients per level
int<lower=1> J_5[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_5_sigma_1;
// data for group-level effects of ID 6
int<lower=1> N_6;  // number of grouping levels
int<lower=1> M_6;  // number of coefficients per level
int<lower=1> J_6[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_6_nu_1;
int prior_only;  // should the likelihood be ignored?
}
transformed data {
real norm18 = 18 * normal_lccdf(0 | 1,1);
real norm10 = 10 * normal_lccdf(0 | 0,0.5);
real norm0_1 = 1 * normal_lccdf(0 | 0,0.1);
real norm1_5 = 1 * normal_lccdf(0 | 0,1.5);
}
parameters {
vector[K] b;  // population-level effects
vector<lower=0>[M_1] sd_1;  // group-level standard deviations
matrix[M_1, N_1] z_1;  // standardized group-level effects
cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
vector<lower=0>[M_2] sd_2;  // group-level standard deviations
matrix[M_2, N_2] z_2;  // standardized group-level effects
cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
vector<lower=0>[M_3] sd_3;  // group-level standard deviations
matrix[M_3, N_3] z_3;  // standardized group-level effects
cholesky_factor_corr[M_3] L_3;  // cholesky factor of correlation matrix
vector<lower=0>[M_4] sd_4;  // group-level standard deviations
vector[N_4] z_4[M_4];  // standardized group-level effects
vector<lower=0>[M_5] sd_5;  // group-level standard deviations
vector[N_5] z_5[M_5];  // standardized group-level effects
vector<lower=0>[M_6] sd_6;  // group-level standard deviations
vector[N_6] z_6[M_6];  // standardized group-level effects
}
transformed parameters {
matrix[N_1, M_1] r_1;  // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_1] r_1_1;
vector[N_1] r_1_2;
vector[N_1] r_1_3;
vector[N_1] r_1_4;
vector[N_1] r_1_5;
vector[N_1] r_1_6;
vector[N_1] r_1_7;
vector[N_1] r_1_8;
vector[N_1] r_1_9;
matrix[N_2, M_2] r_2;  // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_2] r_2_1;
vector[N_2] r_2_2;
vector[N_2] r_2_3;
vector[N_2] r_2_4;
vector[N_2] r_2_5;
vector[N_2] r_2_6;
vector[N_2] r_2_7;
vector[N_2] r_2_8;
vector[N_2] r_2_9;
matrix[N_3, M_3] r_3;  // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_3] r_3_1;
vector[N_3] r_3_2;
vector[N_3] r_3_3;
vector[N_3] r_3_4;
vector[N_3] r_3_5;
vector[N_3] r_3_6;
vector[N_3] r_3_7;
vector[N_3] r_3_8;
vector[N_3] r_3_9;
vector[N_4] r_4_1;  // actual group-level effects
vector[N_5] r_5_sigma_1;  // actual group-level effects
vector[N_6] r_6_nu_1;  // actual group-level effects
// compute actual group-level effects
r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_1 = r_1[, 1];
r_1_2 = r_1[, 2];
r_1_3 = r_1[, 3];
r_1_4 = r_1[, 4];
r_1_5 = r_1[, 5];
r_1_6 = r_1[, 6];
r_1_7 = r_1[, 7];
r_1_8 = r_1[, 8];
r_1_9 = r_1[, 9];
// compute actual group-level effects
r_2 = scale_r_cor(z_2, sd_2, L_2);
r_2_1 = r_2[, 1];
r_2_2 = r_2[, 2];
r_2_3 = r_2[, 3];
r_2_4 = r_2[, 4];
r_2_5 = r_2[, 5];
r_2_6 = r_2[, 6];
r_2_7 = r_2[, 7];
r_2_8 = r_2[, 8];
r_2_9 = r_2[, 9];
// compute actual group-level effects
r_3 = scale_r_cor(z_3, sd_3, L_3);
r_3_1 = r_3[, 1];
r_3_2 = r_3[, 2];
r_3_3 = r_3[, 3];
r_3_4 = r_3[, 4];
r_3_5 = r_3[, 5];
r_3_6 = r_3[, 6];
r_3_7 = r_3[, 7];
r_3_8 = r_3[, 8];
r_3_9 = r_3[, 9];
r_4_1 = (sd_4[1] * (z_4[1]));
r_5_sigma_1 = (sd_5[1] * (z_5[1]));
r_6_nu_1 = (sd_6[1] * (z_6[1]));
}
model {
// likelihood not including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = X * b;
// initialize linear predictor term
vector[N] sigma = rep_vector(0.0, N);
// initialize linear predictor term
vector[N] nu = rep_vector(0.0, N);
//for (n in 1:N) {
// add more terms to the linear predictor
mu += r_1_1[J_1] .* Z_1_1 + r_1_2[J_1] .* Z_1_2 + r_1_3[J_1] .* Z_1_3 + r_1_4[J_1] .* Z_1_4 + r_1_5[J_1] .* Z_1_5
+ r_1_6[J_1] .* Z_1_6 + r_1_7[J_1] .* Z_1_7 + r_1_8[J_1] .* Z_1_8 + r_1_9[J_1] .* Z_1_9
+ r_2_1[J_2] .* Z_2_1 + r_2_2[J_2] .* Z_2_2 + r_2_3[J_2] .* Z_2_3 + r_2_4[J_2] .* Z_2_4
+ r_2_5[J_2] .* Z_2_5 + r_2_6[J_2] .* Z_2_6 + r_2_7[J_2] .* Z_2_7 + r_2_8[J_2] .* Z_2_8
+ r_2_9[J_2] .* Z_2_9 + r_3_1[J_3] .* Z_3_1 + r_3_2[J_3] .* Z_3_2 + r_3_3[J_3] .* Z_3_3
+ r_3_4[J_3] .* Z_3_4 + r_3_5[J_3] .* Z_3_5 + r_3_6[J_3] .* Z_3_6 + r_3_7[J_3] .* Z_3_7
+ r_3_8[J_3] .* Z_3_8 + r_3_9[J_3] .* Z_3_9 + r_4_1[J_4] .* Z_4_1;
//}
//for (n in 1:N) {
// add more terms to the linear predictor
sigma += r_5_sigma_1[J_5] .* Z_5_sigma_1;
//}
//for (n in 1:N) {
// add more terms to the linear predictor
nu += r_6_nu_1[J_6] .* Z_6_nu_1;
//}
//for (n in 1:N) {
// apply the inverse link function
sigma = exp(sigma);
// }
for (n in 1:N) {
// apply the inverse link function
nu[n] = expp1(nu[n]);
}
target += -norm18;
target += -norm10;
target += -norm0_1;
target += -norm1_5;
target += student_t_lpdf(Y | nu, mu, sigma);
}
// priors not including constants
b[1] ~ normal(5,5);
b[2] ~ normal(0,3);
b[3] ~ normal(0,3);
b[4] ~ normal(0,3);
b[5] ~ normal(0,3);
b[6] ~ normal(0,3);
b[7] ~ normal(0,3);
b[8] ~ normal(0,3);
b[9] ~ normal(0,3);
b[10] ~ normal(0,3);
b[11] ~ normal(0,3);
b[12] ~ normal(0,3);
b[13] ~ normal(0,3);
b[14] ~ normal(0,3);
b[15] ~ normal(0,3);
b[16] ~ normal(0,3);
b[17] ~ normal(0,3);
b[18] ~ normal(0,3);
b[19] ~ normal(0,3);
b[20] ~ normal(0,3);
b[21] ~ normal(0,3);
b[22] ~ normal(0,3);
b[23] ~ normal(0,3);
b[24] ~ normal(0,3);
b[25] ~ normal(0,3);
b[26] ~ normal(0,3);
b[27] ~ normal(0,3);
b[28] ~ normal(0,3);
b[29] ~ normal(0,3);
b[30] ~ normal(0,3);
b[31] ~ normal(0,3);
b[32] ~ normal(0,3);
b[33] ~ normal(0,3);
b[34] ~ normal(0,3);
b[35] ~ normal(0,3);
b[36] ~ normal(0,3);
b[37] ~ normal(0,3);
b[38] ~ normal(0,3);
//target += normal_lpdf(sd_1 | 1,1)
//- 9 * normal_lccdf(0 | 1,1);
sd_1 ~ normal(1,1);
z_1[1] ~ std_normal();
z_1[2] ~ std_normal();
z_1[3] ~ std_normal();
z_1[4] ~ std_normal();
z_1[5] ~ std_normal();
z_1[6] ~ std_normal();
z_1[7] ~ std_normal();
z_1[8] ~ std_normal();
z_1[9] ~ std_normal();
L_1 ~ lkj_corr_cholesky(1);
sd_2 ~ normal(1,1);
z_2[1] ~ std_normal();
z_2[2] ~ std_normal();
z_2[3] ~ std_normal();
z_2[4] ~ std_normal();
z_2[5] ~ std_normal();
z_2[6] ~ std_normal();
z_2[7] ~ std_normal();
z_2[8] ~ std_normal();
z_2[9] ~ std_normal();
L_2 ~ lkj_corr_cholesky(1);
sd_3 ~ normal(0,0.5);
z_3[1] ~ std_normal();
z_3[2] ~ std_normal();
z_3[3] ~ std_normal();
z_3[4] ~ std_normal();
z_3[5] ~ std_normal();
z_3[6] ~ std_normal();
z_3[7] ~ std_normal();
z_3[8] ~ std_normal();
z_3[9] ~ std_normal();
L_3 ~ lkj_corr_cholesky(1);
sd_4[1] ~ normal(0,0.5);
z_4[1] ~ std_normal();
sd_5[1] ~ normal(0,0.1);
z_5[1] ~ std_normal();
sd_5[1] ~ normal(0,1.5);
z_6[1] ~ std_normal();
}
generated quantities {
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// compute group-level correlations
corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
vector<lower=-1,upper=1>[NC_2] cor_2;
// compute group-level correlations
corr_matrix[M_3] Cor_3 = multiply_lower_tri_self_transpose(L_3);
vector<lower=-1,upper=1>[NC_3] cor_3;
// extract upper diagonal of correlation matrix
for (k in 1:M_1) {
for (j in 1:(k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
// extract upper diagonal of correlation matrix
for (k in 1:M_2) {
for (j in 1:(k - 1)) {
cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
}
}
// extract upper diagonal of correlation matrix
for (k in 1:M_3) {
for (j in 1:(k - 1)) {
cor_3[choose(k - 1, 2) + j] = Cor_3[j, k];
}
}
}

``````

The model compiles okay, but I get several warnings/error about student t parameters when I sample:

``````CmdStan_Fit <- CmdStan_Mod\$sample(
data = CmdStan_Data,      # Named list of data
iter_warmup = 2000,            # Total number of iterations per chain
iter_sampling = 2000,          # Number of warmup iterations per chain
init = "0",               # Set intial values to 0
chains = 4,
parallel_chains = ncores/2,
max_treedepth=14,
``````
``````Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 3 Exception: student_t_lpdf: Degrees of freedom parameter[109] is inf, but must be positive finite! (in '/tmp/Rtmp8hhSxn/model-6eeed5e958d55.stan', line 241, column 4 to column 48)
Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: student_t_lpdf: Location parameter[1] is -nan, but must be finite! (in '/tmp/Rtmp8hhSxn/model-6eeed5e958d55.stan', line 241, column 4 to column 48)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 4 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: student_t_lpdf: Scale parameter[1] is inf, but must be positive finite! (in '/tmp/Rtmp8hhSxn/model-6eeed5e958d55.stan', line 241, column 4 to column 48)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
``````

Do you have any suggestions for how I might start the 3 student t parameters more effectively?

If that warning only pops up at the start of warmup, and not consistently throughout sampling, then it’s safe for you to ignore.

Additionally, even though you’ve set `threads_per_chain = 2` in your `cmdstanr` call you haven’t generated Stan code that will take advantage of it. You need to specify in `brms` that you want to use within-chain parallelisation by adding `threads = threading(number_threads)` to the call

Also, I realise that I should have clarified earlier that using `normalize=FALSE`, or the `~` notation, is only appropriate if you’re not planning on computing Bayes factors afterwards, as these need all normalising constants included.

Thanks again @andrjohns !

I definitely missed the `reduce_sum` in my call!

I revised the automatic code generation with:

``````RECD_CmdStan_Model <- make_stancode(RECD_Grp_Cor_Dist_fmla,
RECD_Data2,
prior = RECD_Grp_Cor_Dist_priors,
backend = "cmdstan",
normalize = FALSE,
parallel_chains = ncores/2)
``````
``````functions {
/* compute correlated group-level effects
* Args:
*   z: matrix of unscaled group-level effects
*   SD: vector of standard deviation parameters
*   L: cholesky factor correlation matrix
* Returns:
*   matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
* Args:
*   p: a positive scalar
* Returns:
*   a scalar in (-Inf, Inf)
*/
real logm1(real y) {
return log(y - 1);
}
/* compute the inverse of the logm1 link
* Args:
*   y: a scalar in (-Inf, Inf)
* Returns:
*   a positive scalar
*/
real expp1(real y) {
return exp(y) + 1;
}
/* integer sequence of values
* Args:
*   start: starting integer
*   end: ending integer
* Returns:
*   an integer sequence from start to end
*/
int[] sequence(int start, int end) {
int seq[end - start + 1];
for (n in 1:num_elements(seq)) {
seq[n] = n + start - 1;
}
return seq;
}
// compute partial sums of the log-likelihood
real partial_log_lik_lpmf(int[] seq, int start, int end, vector Y, matrix X, vector b, int[] J_1, vector Z_1_1, vector Z_1_2, vector Z_1_3, vector Z_1_4, vector Z_1_5, vector Z_1_6, vector Z_1_7, vector Z_1_8, vector Z_1_9, vector r_1_1, vector r_1_2, vector r_1_3, vector r_1_4, vector r_1_5, vector r_1_6, vector r_1_7, vector r_1_8, vector r_1_9, int[] J_2, vector Z_2_1, vector Z_2_2, vector Z_2_3, vector Z_2_4, vector Z_2_5, vector Z_2_6, vector Z_2_7, vector Z_2_8, vector Z_2_9, vector r_2_1, vector r_2_2, vector r_2_3, vector r_2_4, vector r_2_5, vector r_2_6, vector r_2_7, vector r_2_8, vector r_2_9, int[] J_3, vector Z_3_1, vector Z_3_2, vector Z_3_3, vector Z_3_4, vector Z_3_5, vector Z_3_6, vector Z_3_7, vector Z_3_8, vector Z_3_9, vector r_3_1, vector r_3_2, vector r_3_3, vector r_3_4, vector r_3_5, vector r_3_6, vector r_3_7, vector r_3_8, vector r_3_9, int[] J_4, vector Z_4_1, vector r_4_1, int[] J_5, vector Z_5_sigma_1, vector r_5_sigma_1, int[] J_6, vector Z_6_nu_1, vector r_6_nu_1) {
real ptarget = 0;
int N = end - start + 1;
// initialize linear predictor term
vector[N] mu = X[start:end] * b;
// initialize linear predictor term
vector[N] sigma = rep_vector(0.0, N);
// initialize linear predictor term
vector[N] nu = rep_vector(0.0, N);
for (n in 1:N) {
// add more terms to the linear predictor
int nn = n + start - 1;
mu[n] += r_1_1[J_1[nn]] * Z_1_1[nn] + r_1_2[J_1[nn]] * Z_1_2[nn] + r_1_3[J_1[nn]] * Z_1_3[nn] + r_1_4[J_1[nn]] * Z_1_4[nn] + r_1_5[J_1[nn]] * Z_1_5[nn] + r_1_6[J_1[nn]] * Z_1_6[nn] + r_1_7[J_1[nn]] * Z_1_7[nn] + r_1_8[J_1[nn]] * Z_1_8[nn] + r_1_9[J_1[nn]] * Z_1_9[nn] + r_2_1[J_2[nn]] * Z_2_1[nn] + r_2_2[J_2[nn]] * Z_2_2[nn] + r_2_3[J_2[nn]] * Z_2_3[nn] + r_2_4[J_2[nn]] * Z_2_4[nn] + r_2_5[J_2[nn]] * Z_2_5[nn] + r_2_6[J_2[nn]] * Z_2_6[nn] + r_2_7[J_2[nn]] * Z_2_7[nn] + r_2_8[J_2[nn]] * Z_2_8[nn] + r_2_9[J_2[nn]] * Z_2_9[nn] + r_3_1[J_3[nn]] * Z_3_1[nn] + r_3_2[J_3[nn]] * Z_3_2[nn] + r_3_3[J_3[nn]] * Z_3_3[nn] + r_3_4[J_3[nn]] * Z_3_4[nn] + r_3_5[J_3[nn]] * Z_3_5[nn] + r_3_6[J_3[nn]] * Z_3_6[nn] + r_3_7[J_3[nn]] * Z_3_7[nn] + r_3_8[J_3[nn]] * Z_3_8[nn] + r_3_9[J_3[nn]] * Z_3_9[nn] + r_4_1[J_4[nn]] * Z_4_1[nn];
}
for (n in 1:N) {
// add more terms to the linear predictor
int nn = n + start - 1;
sigma[n] += r_5_sigma_1[J_5[nn]] * Z_5_sigma_1[nn];
}
for (n in 1:N) {
// add more terms to the linear predictor
int nn = n + start - 1;
nu[n] += r_6_nu_1[J_6[nn]] * Z_6_nu_1[nn];
}
for (n in 1:N) {
// apply the inverse link function
sigma[n] = exp(sigma[n]);
}
for (n in 1:N) {
// apply the inverse link function
nu[n] = expp1(nu[n]);
}
ptarget += student_t_lupdf(Y[start:end] | nu, mu, sigma);
return ptarget;
}
}
data {
int<lower=1> N;  // total number of observations
vector[N] Y;  // response variable
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
int grainsize;  // grainsize for threading
// data for group-level effects of ID 1
int<lower=1> N_1;  // number of grouping levels
int<lower=1> M_1;  // number of coefficients per level
int<lower=1> J_1[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
vector[N] Z_1_2;
vector[N] Z_1_3;
vector[N] Z_1_4;
vector[N] Z_1_5;
vector[N] Z_1_6;
vector[N] Z_1_7;
vector[N] Z_1_8;
vector[N] Z_1_9;
int<lower=1> NC_1;  // number of group-level correlations
// data for group-level effects of ID 2
int<lower=1> N_2;  // number of grouping levels
int<lower=1> M_2;  // number of coefficients per level
int<lower=1> J_2[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_2_1;
vector[N] Z_2_2;
vector[N] Z_2_3;
vector[N] Z_2_4;
vector[N] Z_2_5;
vector[N] Z_2_6;
vector[N] Z_2_7;
vector[N] Z_2_8;
vector[N] Z_2_9;
int<lower=1> NC_2;  // number of group-level correlations
// data for group-level effects of ID 3
int<lower=1> N_3;  // number of grouping levels
int<lower=1> M_3;  // number of coefficients per level
int<lower=1> J_3[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_3_1;
vector[N] Z_3_2;
vector[N] Z_3_3;
vector[N] Z_3_4;
vector[N] Z_3_5;
vector[N] Z_3_6;
vector[N] Z_3_7;
vector[N] Z_3_8;
vector[N] Z_3_9;
int<lower=1> NC_3;  // number of group-level correlations
// data for group-level effects of ID 4
int<lower=1> N_4;  // number of grouping levels
int<lower=1> M_4;  // number of coefficients per level
int<lower=1> J_4[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_4_1;
// data for group-level effects of ID 5
int<lower=1> N_5;  // number of grouping levels
int<lower=1> M_5;  // number of coefficients per level
int<lower=1> J_5[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_5_sigma_1;
// data for group-level effects of ID 6
int<lower=1> N_6;  // number of grouping levels
int<lower=1> M_6;  // number of coefficients per level
int<lower=1> J_6[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_6_nu_1;
int prior_only;  // should the likelihood be ignored?
}
transformed data {
int seq[N] = sequence(1, N);
}
parameters {
vector[K] b;  // population-level effects
vector<lower=0>[M_1] sd_1;  // group-level standard deviations
matrix[M_1, N_1] z_1;  // standardized group-level effects
cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
vector<lower=0>[M_2] sd_2;  // group-level standard deviations
matrix[M_2, N_2] z_2;  // standardized group-level effects
cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
vector<lower=0>[M_3] sd_3;  // group-level standard deviations
matrix[M_3, N_3] z_3;  // standardized group-level effects
cholesky_factor_corr[M_3] L_3;  // cholesky factor of correlation matrix
vector<lower=0>[M_4] sd_4;  // group-level standard deviations
vector[N_4] z_4[M_4];  // standardized group-level effects
vector<lower=0>[M_5] sd_5;  // group-level standard deviations
vector[N_5] z_5[M_5];  // standardized group-level effects
vector<lower=0>[M_6] sd_6;  // group-level standard deviations
vector[N_6] z_6[M_6];  // standardized group-level effects
}
transformed parameters {
matrix[N_1, M_1] r_1;  // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_1] r_1_1;
vector[N_1] r_1_2;
vector[N_1] r_1_3;
vector[N_1] r_1_4;
vector[N_1] r_1_5;
vector[N_1] r_1_6;
vector[N_1] r_1_7;
vector[N_1] r_1_8;
vector[N_1] r_1_9;
matrix[N_2, M_2] r_2;  // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_2] r_2_1;
vector[N_2] r_2_2;
vector[N_2] r_2_3;
vector[N_2] r_2_4;
vector[N_2] r_2_5;
vector[N_2] r_2_6;
vector[N_2] r_2_7;
vector[N_2] r_2_8;
vector[N_2] r_2_9;
matrix[N_3, M_3] r_3;  // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_3] r_3_1;
vector[N_3] r_3_2;
vector[N_3] r_3_3;
vector[N_3] r_3_4;
vector[N_3] r_3_5;
vector[N_3] r_3_6;
vector[N_3] r_3_7;
vector[N_3] r_3_8;
vector[N_3] r_3_9;
vector[N_4] r_4_1;  // actual group-level effects
vector[N_5] r_5_sigma_1;  // actual group-level effects
vector[N_6] r_6_nu_1;  // actual group-level effects
// compute actual group-level effects
r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_1 = r_1[, 1];
r_1_2 = r_1[, 2];
r_1_3 = r_1[, 3];
r_1_4 = r_1[, 4];
r_1_5 = r_1[, 5];
r_1_6 = r_1[, 6];
r_1_7 = r_1[, 7];
r_1_8 = r_1[, 8];
r_1_9 = r_1[, 9];
// compute actual group-level effects
r_2 = scale_r_cor(z_2, sd_2, L_2);
r_2_1 = r_2[, 1];
r_2_2 = r_2[, 2];
r_2_3 = r_2[, 3];
r_2_4 = r_2[, 4];
r_2_5 = r_2[, 5];
r_2_6 = r_2[, 6];
r_2_7 = r_2[, 7];
r_2_8 = r_2[, 8];
r_2_9 = r_2[, 9];
// compute actual group-level effects
r_3 = scale_r_cor(z_3, sd_3, L_3);
r_3_1 = r_3[, 1];
r_3_2 = r_3[, 2];
r_3_3 = r_3[, 3];
r_3_4 = r_3[, 4];
r_3_5 = r_3[, 5];
r_3_6 = r_3[, 6];
r_3_7 = r_3[, 7];
r_3_8 = r_3[, 8];
r_3_9 = r_3[, 9];
r_4_1 = (sd_4[1] * (z_4[1]));
r_5_sigma_1 = (sd_5[1] * (z_5[1]));
r_6_nu_1 = (sd_6[1] * (z_6[1]));
}
model {
// likelihood not including constants
if (!prior_only) {
target += reduce_sum(partial_log_lik_lpmf, seq, grainsize, Y, X, b, J_1, Z_1_1, Z_1_2, Z_1_3, Z_1_4, Z_1_5, Z_1_6, Z_1_7, Z_1_8, Z_1_9, r_1_1, r_1_2, r_1_3, r_1_4, r_1_5, r_1_6, r_1_7, r_1_8, r_1_9, J_2, Z_2_1, Z_2_2, Z_2_3, Z_2_4, Z_2_5, Z_2_6, Z_2_7, Z_2_8, Z_2_9, r_2_1, r_2_2, r_2_3, r_2_4, r_2_5, r_2_6, r_2_7, r_2_8, r_2_9, J_3, Z_3_1, Z_3_2, Z_3_3, Z_3_4, Z_3_5, Z_3_6, Z_3_7, Z_3_8, Z_3_9, r_3_1, r_3_2, r_3_3, r_3_4, r_3_5, r_3_6, r_3_7, r_3_8, r_3_9, J_4, Z_4_1, r_4_1, J_5, Z_5_sigma_1, r_5_sigma_1, J_6, Z_6_nu_1, r_6_nu_1);
}
// priors not including constants
b[1] ~ normal(5,5);
b[2] ~ normal(0,3);
b[3] ~ normal(0,3);
b[4] ~ normal(0,3);
b[5] ~ normal(0,3);
b[6] ~ normal(0,3);
b[7] ~ normal(0,3);
b[8] ~ normal(0,3);
b[9] ~ normal(0,3);
b[10] ~ normal(0,3);
b[11] ~ normal(0,3);
b[12] ~ normal(0,3);
b[13] ~ normal(0,3);
b[14] ~ normal(0,3);
b[15] ~ normal(0,3);
b[16] ~ normal(0,3);
b[17] ~ normal(0,3);
b[18] ~ normal(0,3);
b[19] ~ normal(0,3);
b[20] ~ normal(0,3);
b[21] ~ normal(0,3);
b[22] ~ normal(0,3);
b[23] ~ normal(0,3);
b[24] ~ normal(0,3);
b[25] ~ normal(0,3);
b[26] ~ normal(0,3);
b[27] ~ normal(0,3);
b[28] ~ normal(0,3);
b[29] ~ normal(0,3);
b[30] ~ normal(0,3);
b[31] ~ normal(0,3);
b[32] ~ normal(0,3);
b[33] ~ normal(0,3);
b[34] ~ normal(0,3);
b[35] ~ normal(0,3);
b[36] ~ normal(0,3);
b[37] ~ normal(0,3);
b[38] ~ normal(0,3);
//target += normal_lpdf(sd_1 | 1,1)
//- 9 * normal_lccdf(0 | 1,1);
sd_1 ~ normal(1,1);
z_1[1] ~ std_normal();
z_1[2] ~ std_normal();
z_1[3] ~ std_normal();
z_1[4] ~ std_normal();
z_1[5] ~ std_normal();
z_1[6] ~ std_normal();
z_1[7] ~ std_normal();
z_1[8] ~ std_normal();
z_1[9] ~ std_normal();
L_1 ~ lkj_corr_cholesky(1);
sd_2 ~ normal(1,1);
z_2[1] ~ std_normal();
z_2[2] ~ std_normal();
z_2[3] ~ std_normal();
z_2[4] ~ std_normal();
z_2[5] ~ std_normal();
z_2[6] ~ std_normal();
z_2[7] ~ std_normal();
z_2[8] ~ std_normal();
z_2[9] ~ std_normal();
L_2 ~ lkj_corr_cholesky(1);
sd_3 ~ normal(0,0.5);
z_3[1] ~ std_normal();
z_3[2] ~ std_normal();
z_3[3] ~ std_normal();
z_3[4] ~ std_normal();
z_3[5] ~ std_normal();
z_3[6] ~ std_normal();
z_3[7] ~ std_normal();
z_3[8] ~ std_normal();
z_3[9] ~ std_normal();
L_3 ~ lkj_corr_cholesky(1);
sd_4[1] ~ normal(0,0.5);
z_4[1] ~ std_normal();
sd_5[1] ~ normal(0,0.1);
z_5[1] ~ std_normal();
sd_5[1] ~ normal(0,1.5);
z_6[1] ~ std_normal();
}
generated quantities {
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// compute group-level correlations
corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
vector<lower=-1,upper=1>[NC_2] cor_2;
// compute group-level correlations
corr_matrix[M_3] Cor_3 = multiply_lower_tri_self_transpose(L_3);
vector<lower=-1,upper=1>[NC_3] cor_3;
// extract upper diagonal of correlation matrix
for (k in 1:M_1) {
for (j in 1:(k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
// extract upper diagonal of correlation matrix
for (k in 1:M_2) {
for (j in 1:(k - 1)) {
cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
}
}
// extract upper diagonal of correlation matrix
for (k in 1:M_3) {
for (j in 1:(k - 1)) {
cor_3[choose(k - 1, 2) + j] = Cor_3[j, k];
}
}
}
``````

Is it possible to combine the declaration and accumulation of the linear predictors when using partial sums (vectorize mu and/or the other distribution parameters)?

Looking at the model, I think you’ll get a better speed-up by first optimising the original model, and then introducing threading. I’ll break my recommendations into a couple sections.

Group-level predictors

Your biggest computational cost at the moment is actually caused by your vectorisation of `mu`:

``````      mu += r_1_1[J_1] .* Z_1_1 + ... + r_4_1[J_4] .* Z_4_1;
``````

Because each set of elementwise multiplication is conducted sequentially. That is, ` r_1_1[J_1] .* Z_1_1` is calculated, then ` r_1_2[J_1] .* Z_1_2`, and so forth. This means that you’re traversing vectors of size `N` multiple times. While there is increased efficiency in the vectorised operation, you’re introducing additional overhead. You can drastically cut down on this by using matrices, rather than vectors.

So at the moment, the code is taking a matrix of effects (`r_1`) and copying its contents to 9 vectors (`r_1_1`-`r_1_9`) to be individually multiplied with the covariates (`Z_1_1`-`Z_1_9`). You can instead pack the covariates into a single matrix (i.e., `Z_1`), and use `rows_dot_product` to perform the elementwise multiplication and addition.

So you’d pack them in the `transformed data` block:

``````transformed data {
matrix[N,9] Z_1 = [Z_1_1', Z_1_2', Z_1_3', Z_1_4', Z_1_5',
Z_1_6', Z_1_7', Z_1_8', Z_1_9'];
matrix[N,9] Z_2 = [Z_2_1', Z_2_2', Z_2_3', Z_2_4', Z_2_5',
Z_2_6', Z_2_7', Z_2_8', Z_2_9'];
matrix[N,9] Z_3 = [Z_3_1', Z_3_2', Z_3_3', Z_3_4', Z_3_5',
Z_3_6', Z_3_7', Z_3_8', Z_3_9'];
}
``````

And then use `rows_dot_product` in the `model` block:

``````    vector[N] mu = X * b + rows_dot_product(r_1[J_1], Z_1)
+ rows_dot_product(r_2[J_2], Z_2)
+ rows_dot_product(r_3[J_3], Z_3)
+ r_4_1[J_4] .* Z_4_1;
``````

nu and sigma

You can also simplify the construction of `nu` and `sigma` to one-liners:

``````    // initialize linear predictor term
vector[N] sigma = exp(r_5_sigma_1[J_5] .* Z_5_sigma_1);
// initialize linear predictor term
vector[N] nu = exp(r_6_nu_1[J_6] .* Z_6_nu_1) + 1;
``````

Normalising constants

Also, given that you’re dropping normalising constants, you don’t need to add the `norm18`-`norm1_5` quantities (as those are constants).

Vectorised priors

You can vectorise your priors by using `to_vector` with the `z_1`-`z_3` matrices:

``````  to_vector(z_1) ~ std_normal();
to_vector(z_2) ~ std_normal();
to_vector(z_3) ~ std_normal();
``````

You can also vectorise the priors for `b` by first constructing an integer sequence from 2 - 38 (i.e., the same as calling `2:38` in R) in the `transformed data` block:

``````  int b_index[37] = linspaced_int_array(37, 2, 38);
``````

And using that to apply a vectorised prior to `b`:

``````b[b_index] ~ normal(0,3);
``````

reduce_sum

Firstly, you’ll need to change the data type of `y` from `vector[N]` to `real y[N]`, as `reduce_sum` can only slice over arrays:

``````  real Y[N];  // response variable
``````

Next, you’ll need a `partial_sum` function that will calculate the likelihood:

``````  real partial_log_lik_lpdf(data real[] y_slice, int start, int end, vector nu,
vector mu, vector sigma) {
return student_t_lupdf(to_vector(y_slice) | nu[start:end], mu[start:end], sigma[start:end]);
}
``````

And then call this in the `model` block:

``````    target += reduce_sum(partial_log_lik_lpdf, Y, grainsize, nu, mu, sigma);
``````
1 Like

Putting all those pieces together, your model would then look like (completed untested):

``````functions {
/* compute correlated group-level effects
* Args:
*   z: matrix of unscaled group-level effects
*   SD: vector of standard deviation parameters
*   L: cholesky factor correlation matrix
* Returns:
*   matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}

real partial_log_lik_lpdf(data real[] y_slice, int start, int end, vector nu,
vector mu, vector sigma) {
return student_t_lupdf(to_vector(y_slice) | nu[start:end], mu[start:end],
sigma[start:end]);
}
}
data {
int<lower=1> N;  // total number of observations
real Y[N];  // response variable
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1;  // number of grouping levels
int<lower=1> M_1;  // number of coefficients per level
int<lower=1> J_1[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
vector[N] Z_1_2;
vector[N] Z_1_3;
vector[N] Z_1_4;
vector[N] Z_1_5;
vector[N] Z_1_6;
vector[N] Z_1_7;
vector[N] Z_1_8;
vector[N] Z_1_9;
int<lower=1> NC_1;  // number of group-level correlations
// data for group-level effects of ID 2
int<lower=1> N_2;  // number of grouping levels
int<lower=1> M_2;  // number of coefficients per level
int<lower=1> J_2[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_2_1;
vector[N] Z_2_2;
vector[N] Z_2_3;
vector[N] Z_2_4;
vector[N] Z_2_5;
vector[N] Z_2_6;
vector[N] Z_2_7;
vector[N] Z_2_8;
vector[N] Z_2_9;
int<lower=1> NC_2;  // number of group-level correlations
// data for group-level effects of ID 3
int<lower=1> N_3;  // number of grouping levels
int<lower=1> M_3;  // number of coefficients per level
int<lower=1> J_3[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_3_1;
vector[N] Z_3_2;
vector[N] Z_3_3;
vector[N] Z_3_4;
vector[N] Z_3_5;
vector[N] Z_3_6;
vector[N] Z_3_7;
vector[N] Z_3_8;
vector[N] Z_3_9;
int<lower=1> NC_3;  // number of group-level correlations
// data for group-level effects of ID 4
int<lower=1> N_4;  // number of grouping levels
int<lower=1> M_4;  // number of coefficients per level
int<lower=1> J_4[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_4_1;
// data for group-level effects of ID 5
int<lower=1> N_5;  // number of grouping levels
int<lower=1> M_5;  // number of coefficients per level
int<lower=1> J_5[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_5_sigma_1;
// data for group-level effects of ID 6
int<lower=1> N_6;  // number of grouping levels
int<lower=1> M_6;  // number of coefficients per level
int<lower=1> J_6[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_6_nu_1;
int prior_only;  // should the likelihood be ignored?
}
transformed data {
matrix[N,9] Z_1 = [Z_1_1', Z_1_2', Z_1_3', Z_1_4', Z_1_5',
Z_1_6', Z_1_7', Z_1_8', Z_1_9'];
matrix[N,9] Z_2 = [Z_2_1', Z_2_2', Z_2_3', Z_2_4', Z_2_5',
Z_2_6', Z_2_7', Z_2_8', Z_2_9'];
matrix[N,9] Z_3 = [Z_3_1', Z_3_2', Z_3_3', Z_3_4', Z_3_5',
Z_3_6', Z_3_7', Z_3_8', Z_3_9'];
int b_index[37] = linspaced_int_array(37, 2, 38);
}
parameters {
vector[K] b;  // population-level effects
vector<lower=0>[M_1] sd_1;  // group-level standard deviations
matrix[M_1, N_1] z_1;  // standardized group-level effects
cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
vector<lower=0>[M_2] sd_2;  // group-level standard deviations
matrix[M_2, N_2] z_2;  // standardized group-level effects
cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
vector<lower=0>[M_3] sd_3;  // group-level standard deviations
matrix[M_3, N_3] z_3;  // standardized group-level effects
cholesky_factor_corr[M_3] L_3;  // cholesky factor of correlation matrix
vector<lower=0>[M_4] sd_4;  // group-level standard deviations
vector[N_4] z_4[M_4];  // standardized group-level effects
vector<lower=0>[M_5] sd_5;  // group-level standard deviations
vector[N_5] z_5[M_5];  // standardized group-level effects
vector<lower=0>[M_6] sd_6;  // group-level standard deviations
vector[N_6] z_6[M_6];  // standardized group-level effects
}
transformed parameters {
// actual group-level effects
matrix[N_1, M_1] r_1 = scale_r_cor(z_1, sd_1, L_1);
matrix[N_2, M_2] r_2 = scale_r_cor(z_2, sd_2, L_2);
matrix[N_3, M_3] r_3 = scale_r_cor(z_3, sd_3, L_3);

vector[N_4] r_4_1 = (sd_4[1] * (z_4[1]));
vector[N_5] r_5_sigma_1 = (sd_5[1] * (z_5[1]));
vector[N_6] r_6_nu_1 = (sd_6[1] * (z_6[1]));
}
model {
int grainsize = 1;
// likelihood not including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = X * b + rows_dot_product(r_1[J_1], Z_1)
+ rows_dot_product(r_2[J_2], Z_2)
+ rows_dot_product(r_3[J_3], Z_3)
+ r_4_1[J_4] .* Z_4_1;
// initialize linear predictor term
vector[N] sigma = exp(r_5_sigma_1[J_5] .* Z_5_sigma_1);
// initialize linear predictor term
vector[N] nu = exp(r_6_nu_1[J_6] .* Z_6_nu_1) + 1;

target += reduce_sum(partial_log_lik_lpdf, Y, grainsize, nu, mu, sigma);
}

// priors not including constants
b[1] ~ normal(5,5);
b[b_index] ~ normal(0,3);

sd_1 ~ normal(1,1);
sd_2 ~ normal(1,1);
sd_3 ~ normal(0,0.5);
sd_4[1] ~ normal(0,0.5);
sd_5[1] ~ normal(0,0.1);
sd_6[1] ~ normal(0,1.5);

to_vector(z_1) ~ std_normal();
to_vector(z_2) ~ std_normal();
to_vector(z_3) ~ std_normal();
z_4[1] ~ std_normal();
z_5[1] ~ std_normal();
z_6[1] ~ std_normal();

L_1 ~ lkj_corr_cholesky(1);
L_2 ~ lkj_corr_cholesky(1);
L_3 ~ lkj_corr_cholesky(1);
}
generated quantities {
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// compute group-level correlations
corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
vector<lower=-1,upper=1>[NC_2] cor_2;
// compute group-level correlations
corr_matrix[M_3] Cor_3 = multiply_lower_tri_self_transpose(L_3);
vector<lower=-1,upper=1>[NC_3] cor_3;
// extract upper diagonal of correlation matrix
for (k in 1:M_1) {
for (j in 1:(k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
// extract upper diagonal of correlation matrix
for (k in 1:M_2) {
for (j in 1:(k - 1)) {
cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
}
}
// extract upper diagonal of correlation matrix
for (k in 1:M_3) {
for (j in 1:(k - 1)) {
cor_3[choose(k - 1, 2) + j] = Cor_3[j, k];
}
}
}
``````

Also, if you have a discrete GPU in your system I’d highly recommend setting up OpenCL in `cmdstanr`, as this will accelerate the `rows_dot_product` calls, among others.

The cmdstan documentation here covers how to install the appropriate drivers and identify your relevant platform and device ID.

You would then use these when compiling your model in `cmdstanr`:

``````acc_options = list(
stan_opencl = TRUE,
opencl_platform_id = your_platform_id,
opencl_device_id = your_device_id
)

mod_cl <- cmdstan_model("your_model.stan", cpp_options = acc_options)
``````

Amazing! Thanks @andrjohns !

Everything has compiled fine with OpenCL (for a GeForce RTX 2060). When running on a GPU should the number of threads per chain be based on the number of available GPU cores (1920 CUDA cores in my case)?

No, `threads_per_chain` refers to CPU threads utilised by `reduce_sum`

Thanks again @andrjohns

The model compiles without error, but sampling does not begin with the following errors for each chain:

``````Chain 1 file3f9673221bc6a_threads:
stan/lib/stan_math/lib/eigen_3.3.9/Eigen/src/Core/CwiseBinaryOp.h:110:
Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::CwiseBinaryOp(const Lhs&, const Rhs&, const BinaryOp&)
[with BinaryOp = Eigen::internal::scalar_product_op<double, double>; LhsType = const
Eigen::Matrix<double, -1, -1>; RhsType = const Eigen::Matrix<double, -1, -1>;
Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::Lhs = Eigen::Matrix<double, -1, -1>;
Eigen::CwiseBinaryOp<BinaryOp, Lhs, Rhs>::Rhs = Eigen::Matrix<double, -1, -1>]:
Assertion `aLhs.rows() == aRhs.rows() && aLhs.cols() == aRhs.cols()' failed.

``````

Here is a subset of the data for the model:
Data_dput.txt (2.2 MB)

Reprex:

``````# Load data
CmdStan_Data <- dget(file = "Data_dput.txt")
``````
``````# Define model
CmdStan_Model <-
"functions {
/* compute correlated group-level effects
* Args:
*   z: matrix of unscaled group-level effects
*   SD: vector of standard deviation parameters
*   L: cholesky factor correlation matrix
* Returns:
*   matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}

real partial_log_lik_lpdf(data real[] y_slice, int start, int end, vector nu,
vector mu, vector sigma) {
return student_t_lupdf(to_vector(y_slice) | nu[start:end], mu[start:end],
sigma[start:end]);
}
}
data {
int<lower=1> N;  // total number of observations
real Y[N];  // response variable
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1;  // number of grouping levels
int<lower=1> M_1;  // number of coefficients per level
int<lower=1> J_1[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
vector[N] Z_1_2;
vector[N] Z_1_3;
vector[N] Z_1_4;
vector[N] Z_1_5;
vector[N] Z_1_6;
vector[N] Z_1_7;
vector[N] Z_1_8;
vector[N] Z_1_9;
int<lower=1> NC_1;  // number of group-level correlations
// data for group-level effects of ID 2
int<lower=1> N_2;  // number of grouping levels
int<lower=1> M_2;  // number of coefficients per level
int<lower=1> J_2[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_2_1;
vector[N] Z_2_2;
vector[N] Z_2_3;
vector[N] Z_2_4;
vector[N] Z_2_5;
vector[N] Z_2_6;
vector[N] Z_2_7;
vector[N] Z_2_8;
vector[N] Z_2_9;
int<lower=1> NC_2;  // number of group-level correlations
// data for group-level effects of ID 3
int<lower=1> N_3;  // number of grouping levels
int<lower=1> M_3;  // number of coefficients per level
int<lower=1> J_3[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_3_1;
vector[N] Z_3_2;
vector[N] Z_3_3;
vector[N] Z_3_4;
vector[N] Z_3_5;
vector[N] Z_3_6;
vector[N] Z_3_7;
vector[N] Z_3_8;
vector[N] Z_3_9;
int<lower=1> NC_3;  // number of group-level correlations
// data for group-level effects of ID 4
int<lower=1> N_4;  // number of grouping levels
int<lower=1> M_4;  // number of coefficients per level
int<lower=1> J_4[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_4_1;
// data for group-level effects of ID 5
int<lower=1> N_5;  // number of grouping levels
int<lower=1> M_5;  // number of coefficients per level
int<lower=1> J_5[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_5_sigma_1;
// data for group-level effects of ID 6
int<lower=1> N_6;  // number of grouping levels
int<lower=1> M_6;  // number of coefficients per level
int<lower=1> J_6[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_6_nu_1;
int prior_only;  // should the likelihood be ignored?
}
transformed data {
matrix[N,9] Z_1 = [Z_1_1', Z_1_2', Z_1_3', Z_1_4', Z_1_5',
Z_1_6', Z_1_7', Z_1_8', Z_1_9'];
matrix[N,9] Z_2 = [Z_2_1', Z_2_2', Z_2_3', Z_2_4', Z_2_5',
Z_2_6', Z_2_7', Z_2_8', Z_2_9'];
matrix[N,9] Z_3 = [Z_3_1', Z_3_2', Z_3_3', Z_3_4', Z_3_5',
Z_3_6', Z_3_7', Z_3_8', Z_3_9'];
int b_index[37] = linspaced_int_array(37, 2, 38);
}
parameters {
vector[K] b;  // population-level effects
vector<lower=0>[M_1] sd_1;  // group-level standard deviations
matrix[M_1, N_1] z_1;  // standardized group-level effects
cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
vector<lower=0>[M_2] sd_2;  // group-level standard deviations
matrix[M_2, N_2] z_2;  // standardized group-level effects
cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
vector<lower=0>[M_3] sd_3;  // group-level standard deviations
matrix[M_3, N_3] z_3;  // standardized group-level effects
cholesky_factor_corr[M_3] L_3;  // cholesky factor of correlation matrix
vector<lower=0>[M_4] sd_4;  // group-level standard deviations
vector[N_4] z_4[M_4];  // standardized group-level effects
vector<lower=0>[M_5] sd_5;  // group-level standard deviations
vector[N_5] z_5[M_5];  // standardized group-level effects
vector<lower=0>[M_6] sd_6;  // group-level standard deviations
vector[N_6] z_6[M_6];  // standardized group-level effects
}
transformed parameters {
// actual group-level effects
matrix[N_1, M_1] r_1 = scale_r_cor(z_1, sd_1, L_1);
matrix[N_2, M_2] r_2 = scale_r_cor(z_2, sd_2, L_2);
matrix[N_3, M_3] r_3 = scale_r_cor(z_3, sd_3, L_3);

vector[N_4] r_4_1 = (sd_4[1] * (z_4[1]));
vector[N_5] r_5_sigma_1 = (sd_5[1] * (z_5[1]));
vector[N_6] r_6_nu_1 = (sd_6[1] * (z_6[1]));
}
model {
int grainsize = 1;
// likelihood not including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = X * b + rows_dot_product(r_1[J_1], Z_1)
+ rows_dot_product(r_2[J_2], Z_2)
+ rows_dot_product(r_3[J_3], Z_3)
+ r_4_1[J_4] .* Z_4_1;
// initialize linear predictor term
vector[N] sigma = exp(r_5_sigma_1[J_5] .* Z_5_sigma_1);
// initialize linear predictor term
vector[N] nu = exp(r_6_nu_1[J_6] .* Z_6_nu_1) + 1;

target += reduce_sum(partial_log_lik_lpdf, Y, grainsize, nu, mu, sigma);
}
// priors not including constants
b[1] ~ normal(5,5);
b[b_index] ~ normal(0,3);
sd_1 ~ normal(1,1);
sd_2 ~ normal(1,1);
sd_3 ~ normal(0,0.5);
sd_4[1] ~ normal(0,0.5);
sd_5[1] ~ normal(0,0.1);
sd_6[1] ~ normal(0,1.5);
to_vector(z_1) ~ std_normal();
to_vector(z_2) ~ std_normal();
to_vector(z_3) ~ std_normal();
z_4[1] ~ std_normal();
z_5[1] ~ std_normal();
z_6[1] ~ std_normal();
L_1 ~ lkj_corr_cholesky(1);
L_2 ~ lkj_corr_cholesky(1);
L_3 ~ lkj_corr_cholesky(1);
}
generated quantities {
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// compute group-level correlations
corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
vector<lower=-1,upper=1>[NC_2] cor_2;
// compute group-level correlations
corr_matrix[M_3] Cor_3 = multiply_lower_tri_self_transpose(L_3);
vector<lower=-1,upper=1>[NC_3] cor_3;
// extract upper diagonal of correlation matrix
for (k in 1:M_1) {
for (j in 1:(k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
// extract upper diagonal of correlation matrix
for (k in 1:M_2) {
for (j in 1:(k - 1)) {
cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
}
}
// extract upper diagonal of correlation matrix
for (k in 1:M_3) {
for (j in 1:(k - 1)) {
cor_3[choose(k - 1, 2) + j] = Cor_3[j, k];
}
}
}
"
``````

Compile model

`````` # Set compile options
acc_options = list(
#stan_opencl = TRUE,
#opencl_platform_id = 0,
#opencl_device_id = 0
)

# Compile model
CmdStan_Mod <- cmdstan_model(write_stan_file(CmdStan_Model),
compile = TRUE,
cpp_options = acc_options)
``````

Sample

``````CmdStan_Fit <- CmdStan_Mod\$sample(
data = CmdStan_Data,
iter_warmup = 500,
iter_sampling = 1000,
#init = initf,
chains = 2,
parallel_chains = 2,
max_treedepth=12,
``````

Ah there was a typo in the `transformed data` block that I gave you, there should a transposition at the end of the `Z_` matrix declarations:

``````  matrix[N,9] Z_1 = [Z_1_1', Z_1_2', Z_1_3', Z_1_4', Z_1_5',
Z_1_6', Z_1_7', Z_1_8', Z_1_9']';
matrix[N,9] Z_2 = [Z_2_1', Z_2_2', Z_2_3', Z_2_4', Z_2_5',
Z_2_6', Z_2_7', Z_2_8', Z_2_9']';
matrix[N,9] Z_3 = [Z_3_1', Z_3_2', Z_3_3', Z_3_4', Z_3_5',
Z_3_6', Z_3_7', Z_3_8', Z_3_9']';
``````

But this should have thrown a different earlier, so it looks like a bug here. Thanks for finding this!

I’ve also just realised that the `student_t` distribution is OpenCL-accelerated as of 2.26, so it could be a good idea to try that directly without using `reduce_sum`

Thanks, @andrjohns

Unfortunately, without `reduce_sum` running the model crashes the whole system.