Hierarchical Multinomial Model Optimization/Specification

I am currently trying to analyze some multinomial data (8 possible categorical responses in each row) with a hierarchical model.

The data are generated from 16 groups of people (8 groups in “control”, 8 groups in "intervention) across 5 sessions, led by 2 different facilitators with just under 4000 observations.

The model takes an extremely long time to run. It’s currently been going for 9 days and has finally reached sampling (6000 iterations with 4500 warm up).

I used BRMS to generate the Stan code and am wondering if there are any potential optimizations that mike help speed things up.

The Stan code is:

`````` /* multinomial-logit log-PMF
* Args:
*   y: array of integer response values
*   mu: vector of category logit probabilities
* Returns:
*   a scalar to be added to the log posterior
*/
real multinomial_logit_lpmf(int[] y, vector mu) {
return multinomial_lpmf(y | softmax(mu));
}
}
data {
int<lower=1> N;  // number of observations
int<lower=2> ncat;  // number of categories
int Y[N, ncat];  // response array
int trials[N];  // number of trials
int<lower=1> K_muBuilding;  // number of population-level effects
matrix[N, K_muBuilding] X_muBuilding;  // population-level design matrix
int<lower=1> K_muCaregiver;  // number of population-level effects
matrix[N, K_muCaregiver] X_muCaregiver;  // population-level design matrix
int<lower=1> K_muPatient;  // number of population-level effects
matrix[N, K_muPatient] X_muPatient;  // population-level design matrix
int<lower=1> K_muPatientpsych;  // number of population-level effects
matrix[N, K_muPatientpsych] X_muPatientpsych;  // population-level design matrix
int<lower=1> K_muSDOH;  // number of population-level effects
matrix[N, K_muSDOH] X_muSDOH;  // population-level design matrix
int<lower=1> K_muCanmeds;  // number of population-level effects
matrix[N, K_muCanmeds] X_muCanmeds;  // population-level design matrix
int<lower=1> K_muExpertise;  // number of population-level effects
matrix[N, K_muExpertise] X_muExpertise;  // population-level design matrix
int<lower=1> K_muRecommendations;  // number of population-level effects
matrix[N, K_muRecommendations] X_muRecommendations;  // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1;  // number of grouping levels
int<lower=1> M_1;  // number of coefficients per level
int<lower=1> J_1[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_muBuilding_1;
// data for group-level effects of ID 2
int<lower=1> N_2;  // number of grouping levels
int<lower=1> M_2;  // number of coefficients per level
int<lower=1> J_2[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_2_muBuilding_1;
// data for group-level effects of ID 3
int<lower=1> N_3;  // number of grouping levels
int<lower=1> M_3;  // number of coefficients per level
int<lower=1> J_3[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_3_muCaregiver_1;
// data for group-level effects of ID 4
int<lower=1> N_4;  // number of grouping levels
int<lower=1> M_4;  // number of coefficients per level
int<lower=1> J_4[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_4_muCaregiver_1;
// data for group-level effects of ID 5
int<lower=1> N_5;  // number of grouping levels
int<lower=1> M_5;  // number of coefficients per level
int<lower=1> J_5[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_5_muPatient_1;
// data for group-level effects of ID 6
int<lower=1> N_6;  // number of grouping levels
int<lower=1> M_6;  // number of coefficients per level
int<lower=1> J_6[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_6_muPatient_1;
// data for group-level effects of ID 7
int<lower=1> N_7;  // number of grouping levels
int<lower=1> M_7;  // number of coefficients per level
int<lower=1> J_7[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_7_muPatientpsych_1;
// data for group-level effects of ID 8
int<lower=1> N_8;  // number of grouping levels
int<lower=1> M_8;  // number of coefficients per level
int<lower=1> J_8[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_8_muPatientpsych_1;
// data for group-level effects of ID 9
int<lower=1> N_9;  // number of grouping levels
int<lower=1> M_9;  // number of coefficients per level
int<lower=1> J_9[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_9_muSDOH_1;
// data for group-level effects of ID 10
int<lower=1> N_10;  // number of grouping levels
int<lower=1> M_10;  // number of coefficients per level
int<lower=1> J_10[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_10_muSDOH_1;
// data for group-level effects of ID 11
int<lower=1> N_11;  // number of grouping levels
int<lower=1> M_11;  // number of coefficients per level
int<lower=1> J_11[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_11_muCanmeds_1;
// data for group-level effects of ID 12
int<lower=1> N_12;  // number of grouping levels
int<lower=1> M_12;  // number of coefficients per level
int<lower=1> J_12[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_12_muCanmeds_1;
// data for group-level effects of ID 13
int<lower=1> N_13;  // number of grouping levels
int<lower=1> M_13;  // number of coefficients per level
int<lower=1> J_13[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_13_muExpertise_1;
// data for group-level effects of ID 14
int<lower=1> N_14;  // number of grouping levels
int<lower=1> M_14;  // number of coefficients per level
int<lower=1> J_14[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_14_muExpertise_1;
// data for group-level effects of ID 15
int<lower=1> N_15;  // number of grouping levels
int<lower=1> M_15;  // number of coefficients per level
int<lower=1> J_15[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_15_muRecommendations_1;
// data for group-level effects of ID 16
int<lower=1> N_16;  // number of grouping levels
int<lower=1> M_16;  // number of coefficients per level
int<lower=1> J_16[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_16_muRecommendations_1;
int prior_only;  // should the likelihood be ignored?
}
transformed data {
int Kc_muBuilding = K_muBuilding - 1;
matrix[N, Kc_muBuilding] Xc_muBuilding;  // centered version of X_muBuilding without an intercept
vector[Kc_muBuilding] means_X_muBuilding;  // column means of X_muBuilding before centering
int Kc_muCaregiver = K_muCaregiver - 1;
matrix[N, Kc_muCaregiver] Xc_muCaregiver;  // centered version of X_muCaregiver without an intercept
vector[Kc_muCaregiver] means_X_muCaregiver;  // column means of X_muCaregiver before centering
int Kc_muPatient = K_muPatient - 1;
matrix[N, Kc_muPatient] Xc_muPatient;  // centered version of X_muPatient without an intercept
vector[Kc_muPatient] means_X_muPatient;  // column means of X_muPatient before centering
int Kc_muPatientpsych = K_muPatientpsych - 1;
matrix[N, Kc_muPatientpsych] Xc_muPatientpsych;  // centered version of X_muPatientpsych without an intercept
vector[Kc_muPatientpsych] means_X_muPatientpsych;  // column means of X_muPatientpsych before centering
int Kc_muSDOH = K_muSDOH - 1;
matrix[N, Kc_muSDOH] Xc_muSDOH;  // centered version of X_muSDOH without an intercept
vector[Kc_muSDOH] means_X_muSDOH;  // column means of X_muSDOH before centering
int Kc_muCanmeds = K_muCanmeds - 1;
matrix[N, Kc_muCanmeds] Xc_muCanmeds;  // centered version of X_muCanmeds without an intercept
vector[Kc_muCanmeds] means_X_muCanmeds;  // column means of X_muCanmeds before centering
int Kc_muExpertise = K_muExpertise - 1;
matrix[N, Kc_muExpertise] Xc_muExpertise;  // centered version of X_muExpertise without an intercept
vector[Kc_muExpertise] means_X_muExpertise;  // column means of X_muExpertise before centering
int Kc_muRecommendations = K_muRecommendations - 1;
matrix[N, Kc_muRecommendations] Xc_muRecommendations;  // centered version of X_muRecommendations without an intercept
vector[Kc_muRecommendations] means_X_muRecommendations;  // column means of X_muRecommendations before centering
for (i in 2:K_muBuilding) {
means_X_muBuilding[i - 1] = mean(X_muBuilding[, i]);
Xc_muBuilding[, i - 1] = X_muBuilding[, i] - means_X_muBuilding[i - 1];
}
for (i in 2:K_muCaregiver) {
means_X_muCaregiver[i - 1] = mean(X_muCaregiver[, i]);
Xc_muCaregiver[, i - 1] = X_muCaregiver[, i] - means_X_muCaregiver[i - 1];
}
for (i in 2:K_muPatient) {
means_X_muPatient[i - 1] = mean(X_muPatient[, i]);
Xc_muPatient[, i - 1] = X_muPatient[, i] - means_X_muPatient[i - 1];
}
for (i in 2:K_muPatientpsych) {
means_X_muPatientpsych[i - 1] = mean(X_muPatientpsych[, i]);
Xc_muPatientpsych[, i - 1] = X_muPatientpsych[, i] - means_X_muPatientpsych[i - 1];
}
for (i in 2:K_muSDOH) {
means_X_muSDOH[i - 1] = mean(X_muSDOH[, i]);
Xc_muSDOH[, i - 1] = X_muSDOH[, i] - means_X_muSDOH[i - 1];
}
for (i in 2:K_muCanmeds) {
means_X_muCanmeds[i - 1] = mean(X_muCanmeds[, i]);
Xc_muCanmeds[, i - 1] = X_muCanmeds[, i] - means_X_muCanmeds[i - 1];
}
for (i in 2:K_muExpertise) {
means_X_muExpertise[i - 1] = mean(X_muExpertise[, i]);
Xc_muExpertise[, i - 1] = X_muExpertise[, i] - means_X_muExpertise[i - 1];
}
for (i in 2:K_muRecommendations) {
means_X_muRecommendations[i - 1] = mean(X_muRecommendations[, i]);
Xc_muRecommendations[, i - 1] = X_muRecommendations[, i] - means_X_muRecommendations[i - 1];
}
}
parameters {
vector[Kc_muBuilding] b_muBuilding;  // population-level effects
real Intercept_muBuilding;  // temporary intercept for centered predictors
vector[Kc_muCaregiver] b_muCaregiver;  // population-level effects
real Intercept_muCaregiver;  // temporary intercept for centered predictors
vector[Kc_muPatient] b_muPatient;  // population-level effects
real Intercept_muPatient;  // temporary intercept for centered predictors
vector[Kc_muPatientpsych] b_muPatientpsych;  // population-level effects
real Intercept_muPatientpsych;  // temporary intercept for centered predictors
vector[Kc_muSDOH] b_muSDOH;  // population-level effects
real Intercept_muSDOH;  // temporary intercept for centered predictors
vector[Kc_muCanmeds] b_muCanmeds;  // population-level effects
real Intercept_muCanmeds;  // temporary intercept for centered predictors
vector[Kc_muExpertise] b_muExpertise;  // population-level effects
real Intercept_muExpertise;  // temporary intercept for centered predictors
vector[Kc_muRecommendations] b_muRecommendations;  // population-level effects
real Intercept_muRecommendations;  // temporary intercept for centered predictors
vector<lower=0>[M_1] sd_1;  // group-level standard deviations
vector[N_1] z_1[M_1];  // standardized group-level effects
vector<lower=0>[M_2] sd_2;  // group-level standard deviations
vector[N_2] z_2[M_2];  // standardized group-level effects
vector<lower=0>[M_3] sd_3;  // group-level standard deviations
vector[N_3] z_3[M_3];  // standardized group-level effects
vector<lower=0>[M_4] sd_4;  // group-level standard deviations
vector[N_4] z_4[M_4];  // standardized group-level effects
vector<lower=0>[M_5] sd_5;  // group-level standard deviations
vector[N_5] z_5[M_5];  // standardized group-level effects
vector<lower=0>[M_6] sd_6;  // group-level standard deviations
vector[N_6] z_6[M_6];  // standardized group-level effects
vector<lower=0>[M_7] sd_7;  // group-level standard deviations
vector[N_7] z_7[M_7];  // standardized group-level effects
vector<lower=0>[M_8] sd_8;  // group-level standard deviations
vector[N_8] z_8[M_8];  // standardized group-level effects
vector<lower=0>[M_9] sd_9;  // group-level standard deviations
vector[N_9] z_9[M_9];  // standardized group-level effects
vector<lower=0>[M_10] sd_10;  // group-level standard deviations
vector[N_10] z_10[M_10];  // standardized group-level effects
vector<lower=0>[M_11] sd_11;  // group-level standard deviations
vector[N_11] z_11[M_11];  // standardized group-level effects
vector<lower=0>[M_12] sd_12;  // group-level standard deviations
vector[N_12] z_12[M_12];  // standardized group-level effects
vector<lower=0>[M_13] sd_13;  // group-level standard deviations
vector[N_13] z_13[M_13];  // standardized group-level effects
vector<lower=0>[M_14] sd_14;  // group-level standard deviations
vector[N_14] z_14[M_14];  // standardized group-level effects
vector<lower=0>[M_15] sd_15;  // group-level standard deviations
vector[N_15] z_15[M_15];  // standardized group-level effects
vector<lower=0>[M_16] sd_16;  // group-level standard deviations
vector[N_16] z_16[M_16];  // standardized group-level effects
}
transformed parameters {
vector[N_1] r_1_muBuilding_1;  // actual group-level effects
vector[N_2] r_2_muBuilding_1;  // actual group-level effects
vector[N_3] r_3_muCaregiver_1;  // actual group-level effects
vector[N_4] r_4_muCaregiver_1;  // actual group-level effects
vector[N_5] r_5_muPatient_1;  // actual group-level effects
vector[N_6] r_6_muPatient_1;  // actual group-level effects
vector[N_7] r_7_muPatientpsych_1;  // actual group-level effects
vector[N_8] r_8_muPatientpsych_1;  // actual group-level effects
vector[N_9] r_9_muSDOH_1;  // actual group-level effects
vector[N_10] r_10_muSDOH_1;  // actual group-level effects
vector[N_11] r_11_muCanmeds_1;  // actual group-level effects
vector[N_12] r_12_muCanmeds_1;  // actual group-level effects
vector[N_13] r_13_muExpertise_1;  // actual group-level effects
vector[N_14] r_14_muExpertise_1;  // actual group-level effects
vector[N_15] r_15_muRecommendations_1;  // actual group-level effects
vector[N_16] r_16_muRecommendations_1;  // actual group-level effects
r_1_muBuilding_1 = (sd_1[1] * (z_1[1]));
r_2_muBuilding_1 = (sd_2[1] * (z_2[1]));
r_3_muCaregiver_1 = (sd_3[1] * (z_3[1]));
r_4_muCaregiver_1 = (sd_4[1] * (z_4[1]));
r_5_muPatient_1 = (sd_5[1] * (z_5[1]));
r_6_muPatient_1 = (sd_6[1] * (z_6[1]));
r_7_muPatientpsych_1 = (sd_7[1] * (z_7[1]));
r_8_muPatientpsych_1 = (sd_8[1] * (z_8[1]));
r_9_muSDOH_1 = (sd_9[1] * (z_9[1]));
r_10_muSDOH_1 = (sd_10[1] * (z_10[1]));
r_11_muCanmeds_1 = (sd_11[1] * (z_11[1]));
r_12_muCanmeds_1 = (sd_12[1] * (z_12[1]));
r_13_muExpertise_1 = (sd_13[1] * (z_13[1]));
r_14_muExpertise_1 = (sd_14[1] * (z_14[1]));
r_15_muRecommendations_1 = (sd_15[1] * (z_15[1]));
r_16_muRecommendations_1 = (sd_16[1] * (z_16[1]));
}
model {
// initialize linear predictor term
vector[N] muBuilding = Intercept_muBuilding + Xc_muBuilding * b_muBuilding;
// initialize linear predictor term
vector[N] muCaregiver = Intercept_muCaregiver + Xc_muCaregiver * b_muCaregiver;
// initialize linear predictor term
vector[N] muPatient = Intercept_muPatient + Xc_muPatient * b_muPatient;
// initialize linear predictor term
vector[N] muPatientpsych = Intercept_muPatientpsych + Xc_muPatientpsych * b_muPatientpsych;
// initialize linear predictor term
vector[N] muSDOH = Intercept_muSDOH + Xc_muSDOH * b_muSDOH;
// initialize linear predictor term
vector[N] muCanmeds = Intercept_muCanmeds + Xc_muCanmeds * b_muCanmeds;
// initialize linear predictor term
vector[N] muExpertise = Intercept_muExpertise + Xc_muExpertise * b_muExpertise;
// initialize linear predictor term
vector[N] muRecommendations = Intercept_muRecommendations + Xc_muRecommendations * b_muRecommendations;
// linear predictor matrix
vector[ncat] mu[N];
for (n in 1:N) {
// add more terms to the linear predictor
muBuilding[n] += r_1_muBuilding_1[J_1[n]] * Z_1_muBuilding_1[n] + r_2_muBuilding_1[J_2[n]] * Z_2_muBuilding_1[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muCaregiver[n] += r_3_muCaregiver_1[J_3[n]] * Z_3_muCaregiver_1[n] + r_4_muCaregiver_1[J_4[n]] * Z_4_muCaregiver_1[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muPatient[n] += r_5_muPatient_1[J_5[n]] * Z_5_muPatient_1[n] + r_6_muPatient_1[J_6[n]] * Z_6_muPatient_1[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muPatientpsych[n] += r_7_muPatientpsych_1[J_7[n]] * Z_7_muPatientpsych_1[n] + r_8_muPatientpsych_1[J_8[n]] * Z_8_muPatientpsych_1[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muSDOH[n] += r_9_muSDOH_1[J_9[n]] * Z_9_muSDOH_1[n] + r_10_muSDOH_1[J_10[n]] * Z_10_muSDOH_1[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muCanmeds[n] += r_11_muCanmeds_1[J_11[n]] * Z_11_muCanmeds_1[n] + r_12_muCanmeds_1[J_12[n]] * Z_12_muCanmeds_1[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muExpertise[n] += r_13_muExpertise_1[J_13[n]] * Z_13_muExpertise_1[n] + r_14_muExpertise_1[J_14[n]] * Z_14_muExpertise_1[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muRecommendations[n] += r_15_muRecommendations_1[J_15[n]] * Z_15_muRecommendations_1[n] + r_16_muRecommendations_1[J_16[n]] * Z_16_muRecommendations_1[n];
}
for (n in 1:N) {
mu[n] = [0, muBuilding[n], muCaregiver[n], muPatient[n], muPatientpsych[n], muSDOH[n], muCanmeds[n], muExpertise[n], muRecommendations[n]]';
}
// priors including all constants
target += normal_lpdf(b_muBuilding[1] | 0,5);
target += normal_lpdf(b_muBuilding[2] | 0,5);
target += normal_lpdf(b_muBuilding[3] | 0,5);
target += normal_lpdf(b_muBuilding[4] | 0,5);
target += normal_lpdf(b_muBuilding[5] | 0,5);
target += normal_lpdf(b_muBuilding[6] | 0,5);
target += normal_lpdf(b_muBuilding[7] | 0,5);
target += normal_lpdf(b_muBuilding[8] | 0,5);
target += normal_lpdf(b_muBuilding[9] | 0,5);
target += normal_lpdf(b_muBuilding[10] | 0,5);
target += normal_lpdf(b_muBuilding[11] | 0,5);
target += normal_lpdf(b_muBuilding[12] | 0,5);
target += normal_lpdf(b_muBuilding[13] | 0,5);
target += normal_lpdf(b_muBuilding[14] | 0,5);
target += normal_lpdf(b_muCaregiver[1] | 0,5);
target += normal_lpdf(b_muCaregiver[2] | 0,5);
target += normal_lpdf(b_muCaregiver[3] | 0,5);
target += normal_lpdf(b_muCaregiver[4] | 0,5);
target += normal_lpdf(b_muCaregiver[5] | 0,5);
target += normal_lpdf(b_muCaregiver[6] | 0,5);
target += normal_lpdf(b_muCaregiver[7] | 0,5);
target += normal_lpdf(b_muCaregiver[8] | 0,5);
target += normal_lpdf(b_muCaregiver[9] | 0,5);
target += normal_lpdf(b_muCaregiver[10] | 0,5);
target += normal_lpdf(b_muCaregiver[11] | 0,5);
target += normal_lpdf(b_muCaregiver[12] | 0,5);
target += normal_lpdf(b_muCaregiver[13] | 0,5);
target += normal_lpdf(b_muCaregiver[14] | 0,5);
target += normal_lpdf(b_muPatient[1] | 0,5);
target += normal_lpdf(b_muPatient[2] | 0,5);
target += normal_lpdf(b_muPatient[3] | 0,5);
target += normal_lpdf(b_muPatient[4] | 0,5);
target += normal_lpdf(b_muPatient[5] | 0,5);
target += normal_lpdf(b_muPatient[6] | 0,5);
target += normal_lpdf(b_muPatient[7] | 0,5);
target += normal_lpdf(b_muPatient[8] | 0,5);
target += normal_lpdf(b_muPatient[9] | 0,5);
target += normal_lpdf(b_muPatient[10] | 0,5);
target += normal_lpdf(b_muPatient[11] | 0,5);
target += normal_lpdf(b_muPatient[12] | 0,5);
target += normal_lpdf(b_muPatient[13] | 0,5);
target += normal_lpdf(b_muPatient[14] | 0,5);
target += normal_lpdf(b_muPatientpsych[1] | 0,5);
target += normal_lpdf(b_muPatientpsych[2] | 0,5);
target += normal_lpdf(b_muPatientpsych[3] | 0,5);
target += normal_lpdf(b_muPatientpsych[4] | 0,5);
target += normal_lpdf(b_muPatientpsych[5] | 0,5);
target += normal_lpdf(b_muPatientpsych[6] | 0,5);
target += normal_lpdf(b_muPatientpsych[7] | 0,5);
target += normal_lpdf(b_muPatientpsych[8] | 0,5);
target += normal_lpdf(b_muPatientpsych[9] | 0,5);
target += normal_lpdf(b_muPatientpsych[10] | 0,5);
target += normal_lpdf(b_muPatientpsych[11] | 0,5);
target += normal_lpdf(b_muPatientpsych[12] | 0,5);
target += normal_lpdf(b_muPatientpsych[13] | 0,5);
target += normal_lpdf(b_muPatientpsych[14] | 0,5);
target += normal_lpdf(b_muSDOH[1] | 0,5);
target += normal_lpdf(b_muSDOH[2] | 0,5);
target += normal_lpdf(b_muSDOH[3] | 0,5);
target += normal_lpdf(b_muSDOH[4] | 0,5);
target += normal_lpdf(b_muSDOH[5] | 0,5);
target += normal_lpdf(b_muSDOH[6] | 0,5);
target += normal_lpdf(b_muSDOH[7] | 0,5);
target += normal_lpdf(b_muSDOH[8] | 0,5);
target += normal_lpdf(b_muSDOH[9] | 0,5);
target += normal_lpdf(b_muSDOH[10] | 0,5);
target += normal_lpdf(b_muSDOH[11] | 0,5);
target += normal_lpdf(b_muSDOH[12] | 0,5);
target += normal_lpdf(b_muSDOH[13] | 0,5);
target += normal_lpdf(b_muSDOH[14] | 0,5);
target += normal_lpdf(b_muCanmeds[1] | 0,5);
target += normal_lpdf(b_muCanmeds[2] | 0,5);
target += normal_lpdf(b_muCanmeds[3] | 0,5);
target += normal_lpdf(b_muCanmeds[4] | 0,5);
target += normal_lpdf(b_muCanmeds[5] | 0,5);
target += normal_lpdf(b_muCanmeds[6] | 0,5);
target += normal_lpdf(b_muCanmeds[7] | 0,5);
target += normal_lpdf(b_muCanmeds[8] | 0,5);
target += normal_lpdf(b_muCanmeds[9] | 0,5);
target += normal_lpdf(b_muCanmeds[10] | 0,5);
target += normal_lpdf(b_muCanmeds[11] | 0,5);
target += normal_lpdf(b_muCanmeds[12] | 0,5);
target += normal_lpdf(b_muCanmeds[13] | 0,5);
target += normal_lpdf(b_muCanmeds[14] | 0,5);
target += normal_lpdf(b_muExpertise[1] | 0,5);
target += normal_lpdf(b_muExpertise[2] | 0,5);
target += normal_lpdf(b_muExpertise[3] | 0,5);
target += normal_lpdf(b_muExpertise[4] | 0,5);
target += normal_lpdf(b_muExpertise[5] | 0,5);
target += normal_lpdf(b_muExpertise[6] | 0,5);
target += normal_lpdf(b_muExpertise[7] | 0,5);
target += normal_lpdf(b_muExpertise[8] | 0,5);
target += normal_lpdf(b_muExpertise[9] | 0,5);
target += normal_lpdf(b_muExpertise[10] | 0,5);
target += normal_lpdf(b_muExpertise[11] | 0,5);
target += normal_lpdf(b_muExpertise[12] | 0,5);
target += normal_lpdf(b_muExpertise[13] | 0,5);
target += normal_lpdf(b_muExpertise[14] | 0,5);
target += normal_lpdf(b_muRecommendations[1] | 0,5);
target += normal_lpdf(b_muRecommendations[2] | 0,5);
target += normal_lpdf(b_muRecommendations[3] | 0,5);
target += normal_lpdf(b_muRecommendations[4] | 0,5);
target += normal_lpdf(b_muRecommendations[5] | 0,5);
target += normal_lpdf(b_muRecommendations[6] | 0,5);
target += normal_lpdf(b_muRecommendations[7] | 0,5);
target += normal_lpdf(b_muRecommendations[8] | 0,5);
target += normal_lpdf(b_muRecommendations[9] | 0,5);
target += normal_lpdf(b_muRecommendations[10] | 0,5);
target += normal_lpdf(b_muRecommendations[11] | 0,5);
target += normal_lpdf(b_muRecommendations[12] | 0,5);
target += normal_lpdf(b_muRecommendations[13] | 0,5);
target += normal_lpdf(b_muRecommendations[14] | 0,5);
target += cauchy_lpdf(sd_1[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_1[1] | 0, 1);
target += cauchy_lpdf(sd_2[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_2[1] | 0, 1);
target += cauchy_lpdf(sd_3[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_3[1] | 0, 1);
target += cauchy_lpdf(sd_4[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_4[1] | 0, 1);
target += cauchy_lpdf(sd_5[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_5[1] | 0, 1);
target += cauchy_lpdf(sd_6[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_6[1] | 0, 1);
target += cauchy_lpdf(sd_7[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_7[1] | 0, 1);
target += cauchy_lpdf(sd_8[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_8[1] | 0, 1);
target += cauchy_lpdf(sd_9[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_9[1] | 0, 1);
target += cauchy_lpdf(sd_10[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_10[1] | 0, 1);
target += cauchy_lpdf(sd_11[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_11[1] | 0, 1);
target += cauchy_lpdf(sd_12[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_12[1] | 0, 1);
target += cauchy_lpdf(sd_13[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_13[1] | 0, 1);
target += cauchy_lpdf(sd_14[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_14[1] | 0, 1);
target += cauchy_lpdf(sd_15[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_15[1] | 0, 1);
target += cauchy_lpdf(sd_16[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
target += normal_lpdf(z_16[1] | 0, 1);
// likelihood including all constants
if (!prior_only) {
for (n in 1:N) {
target += multinomial_logit_lpmf(Y[n] | mu[n]);
}
}
}
generated quantities {
// actual population-level intercept
real b_muBuilding_Intercept = Intercept_muBuilding - dot_product(means_X_muBuilding, b_muBuilding);
// actual population-level intercept
real b_muCaregiver_Intercept = Intercept_muCaregiver - dot_product(means_X_muCaregiver, b_muCaregiver);
// actual population-level intercept
real b_muPatient_Intercept = Intercept_muPatient - dot_product(means_X_muPatient, b_muPatient);
// actual population-level intercept
real b_muPatientpsych_Intercept = Intercept_muPatientpsych - dot_product(means_X_muPatientpsych, b_muPatientpsych);
// actual population-level intercept
real b_muSDOH_Intercept = Intercept_muSDOH - dot_product(means_X_muSDOH, b_muSDOH);
// actual population-level intercept
real b_muCanmeds_Intercept = Intercept_muCanmeds - dot_product(means_X_muCanmeds, b_muCanmeds);
// actual population-level intercept
real b_muExpertise_Intercept = Intercept_muExpertise - dot_product(means_X_muExpertise, b_muExpertise);
// actual population-level intercept
real b_muRecommendations_Intercept = Intercept_muRecommendations - dot_product(means_X_muRecommendations, b_muRecommendations);
}

``````

I have uploaded a .csv of the “large list” of the data.

Anything that might help speed this up would be great. Waiting a week to then find divergent transitions or inadequate tree depth is a source of frustration.

1 Like

Paul Bürkner has built a lot of standard optimisations in brms, so I am not even going to presume to be able to do better than his implementation.

I think from just scanning the priors, the model has a lot of very broad priors which gives counter-intuitive prior distributions on the logit scale. Basically normal(0,5), if the predictors are on the unit scale, and folded cauchy priors are indicating that you expect the model to be relatively sure that a category will be chosen or not chosen. My advice would be to start with normal(0, 1) for the coefficients and exp(1) for hierarchical scales and do a prior predictive check with brms. I think you can do `prior_only = TRUE` in brms. You probably don’t need 4500 warmup if the model is appropriate for your data. Once you have the prior right, you can maybe start with warmup = 1000 and iter = 2000 to see whether you have any speed-ups.

The way I think about it is that if you can improve the prior with 1 day of work and it shaves off 2 days of computations that’s a win.

EDIT: Also have a look at this post that focuses on avoiding redundant computations. This might help as well.

2 Likes

Thank you very much!

Do you think the sd priors should still be specified as half-cauchy (0, exp(1)), or would half-normal likely be better?

Hi JLC,

There are three things that I’d try here. The first is replacing the loops with vectorised calculations. So rather than:

``````  for (n in 1:N) {
muBuilding[n] += r_1_muBuilding_1[J_1[n]] * Z_1_muBuilding_1[n] + r_2_muBuilding_1[J_2[n]] * Z_2_muBuilding_1[n];
}
``````

You could have:

``````muBuilding += r_1_muBuilding_1[J_1] .* Z_1_muBuilding_1 + r_2_muBuilding_1[J_2] .* Z_2_muBuilding_1;
``````

Secondly, unless you need the distributions with normalising constants (e.g., if you’re going to be calculating marginal likelihoods or bayes factors), you should replace the `target +=` syntax with the `~` syntax. This can cut out some computations on the back-end that you don’t need. For example, rather than:

``````target += normal_lpdf(b_muBuilding[1] | 0,5);
``````

You would say:

``````b_muBuilding[1] ~ normal(0, 5);
``````

Finally, you could also take advantage of the vectorised prior distributions. So rather than:

``````b_muBuilding[1] ~ normal(0, 5);
...
b_muBuilding[14] ~ normal(0, 5);
``````

You can just specify:

``````b_muBuilding ~ normal(0, 5);
``````

I don’t think you can use the vectorisation or `~` syntax for the Cauchy priors though.

Hope some of that helps!
Andrew

1 Like

Thank you @stijn !

In running running the model on only priors, I found I hadn’t put priors on category intercepts. Adding those and using normal(0,1) on betas and cauchy(0, exp(1)) on sd’s in brms sped things up a lot! It finished in a day with no divergent transitions 3000 iterations (1500 warmup).

Unfortunately, brms doesn’t have a built-in pp_check for multinomial models.

@andrjohns

Thanks for the recommendations!

I found what I think we’re a number of redundant estimates and added in your suggestions:

``````functions {

/* multinomial-logit log-PMF
* Args:
*   y: array of integer response values
*   mu: vector of category logit probabilities
* Returns:
*   a scalar to be added to the log posterior
*/
real multinomial_logit_lpmf(int[] y, vector mu) {
return multinomial_lpmf(y | softmax(mu));
}
}
data {
int<lower=1> N;  // number of observations
int<lower=2> ncat;  // number of categories
int Coded[N, ncat];  // response array
int trials[N];  // number of trials
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
// data for group-level effects of ID 1 (Condition)
int<lower=1> n_Condition;  // number of grouping levels for Condition
int<lower=1> M_Condition;  // number of coefficients per level
int<lower=1> J_1[N];  // grouping indicator per observation
// group-level predictor values for Condition
vector[N] Z_Condition;
// data for group-level effects of ID 2 (Group)
int<lower=1> n_Group;  // number of grouping levels for Group
int<lower=1> M_Group;  // number of coefficients per level
int<lower=1> J_2[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_Group;
int prior_only;  // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc;  // centered version of X_muBuilding without an intercept
vector[Kc] means_X;  // column means of X_muBuilding before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b_muBuilding;  // population-level effects
real Intercept_muBuilding;  // temporary intercept for centered predictors
vector[Kc] b_muCaregiver;  // population-level effects
real Intercept_muCaregiver;  // temporary intercept for centered predictors
vector[Kc] b_muPatient;  // population-level effects
real Intercept_muPatient;  // temporary intercept for centered predictors
vector[Kc] b_muPatientpsych;  // population-level effects
real Intercept_muPatientpsych;  // temporary intercept for centered predictors
vector[Kc] b_muSDOH;  // population-level effects
real Intercept_muSDOH;  // temporary intercept for centered predictors
vector[Kc] b_muCanmeds;  // population-level effects
real Intercept_muCanmeds;  // temporary intercept for centered predictors
vector[Kc] b_muExpertise;  // population-level effects
real Intercept_muExpertise;  // temporary intercept for centered predictors
vector[Kc] b_muRecommendations;  // population-level effects
real Intercept_muRecommendations;  // temporary intercept for centered predictors
vector<lower=0>[M_Condition] sd_Condition_muBuilding;  // group-level standard deviations
vector[n_Condition] z_Condition_muBuilding[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muBuilding;  // group-level standard deviations
vector[n_Group] z_Group_muBuilding[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muCaregiver;  // group-level standard deviations
vector[n_Condition] z_Condition_muCaregiver[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muCaregiver;  // group-level standard deviations
vector[n_Group] z_Group_muCaregiver[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muPatient;  // group-level standard deviations
vector[n_Condition] z_Condition_muPatient[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muPatient;  // group-level standard deviations
vector[n_Group] z_Group_muPatient[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muPatientpsych;  // group-level standard deviations
vector[n_Condition] z_Condition_muPatientpsych[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muPatientpsych;  // group-level standard deviations
vector[n_Group] z_Group_muPatientpsych[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muSDOH;  // group-level standard deviations
vector[n_Condition] z_Condition_muSDOH[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muSDOH;  // group-level standard deviations
vector[n_Group] z_Group_muSDOH[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muCanmeds;  // group-level standard deviations
vector[n_Condition] z_Condition_muCanmeds[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muCanmeds;  // group-level standard deviations
vector[n_Group] z_Group_muCanmeds[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muExpertise;  // group-level standard deviations
vector[n_Condition] z_Condition_muExpertise[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muExpertise;  // group-level standard deviations
vector[n_Group] z_Group_muExpertise[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muRecommendations;  // group-level standard deviations
vector[n_Condition] z_Condition_muRecommendations[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muRecommendations;  // group-level standard deviations
vector[n_Group] z_Group_muRecommendations[M_Group];  // standardized group-level effects
}
transformed parameters {
vector[n_Condition] r_Condition_muBuilding;  // actual group-level effects
vector[n_Group] r_Group_muBuilding;  // actual group-level effects
vector[n_Condition] r_Condition_muCaregiver;  // actual group-level effects
vector[n_Group] r_Group_muCaregiver;  // actual group-level effects
vector[n_Condition] r_Condition_muPatient;  // actual group-level effects
vector[n_Group] r_Group_muPatient;  // actual group-level effects
vector[n_Condition] r_Condition_muPatientpsych;  // actual group-level effects
vector[n_Group] r_Group_muPatientpsych;  // actual group-level effects
vector[n_Condition] r_Condition_muSDOH;  // actual group-level effects
vector[n_Group] r_Group_muSDOH;  // actual group-level effects
vector[n_Condition] r_Condition_muCanmeds;  // actual group-level effects
vector[n_Group] r_Group_muCanmeds;  // actual group-level effects
vector[n_Condition] r_Condition_muExpertise;  // actual group-level effects
vector[n_Group] r_Group_muExpertise;  // actual group-level effects
vector[n_Condition] r_Condition_muRecommendations;  // actual group-level effects
vector[n_Group] r_Group_muRecommendations;  // actual group-level effects
r_Condition_muBuilding = (sd_Condition_muBuilding[1] * (z_Condition_muBuilding[1]));
r_Group_muBuilding = (sd_Group_muBuilding[1] * (z_Group_muBuilding[1]));
r_Condition_muCaregiver = (sd_Condition_muCaregiver[1] * (z_Condition_muCaregiver[1]));
r_Group_muCaregiver = (sd_Group_muCaregiver[1] * (z_Group_muCaregiver[1]));
r_Condition_muPatient = (sd_Condition_muPatient[1] * (z_Condition_muPatient[1]));
r_Group_muPatient = (sd_Group_muPatient[1] * (z_Group_muPatient[1]));
r_Condition_muPatientpsych = (sd_Condition_muPatientpsych[1] * (z_Condition_muPatientpsych[1]));
r_Group_muPatientpsych = (sd_Group_muPatientpsych[1] * (z_Group_muPatientpsych[1]));
r_Condition_muSDOH = (sd_Condition_muSDOH[1] * (z_Condition_muSDOH[1]));
r_Group_muSDOH = (sd_Group_muSDOH[1] * (z_Group_muSDOH[1]));
r_Condition_muCanmeds = (sd_Condition_muCanmeds[1] * (z_Condition_muCanmeds[1]));
r_Group_muCanmeds = (sd_Group_muCanmeds[1] * (z_Group_muCanmeds[1]));
r_Condition_muExpertise = (sd_Condition_muExpertise[1] * (z_Condition_muExpertise[1]));
r_Group_muExpertise = (sd_Group_muExpertise[1] * (z_Group_muExpertise[1]));
r_Condition_muRecommendations = (sd_Condition_muRecommendations[1] * (z_Condition_muRecommendations[1]));
r_Group_muRecommendations = (sd_Group_muRecommendations[1] * (z_Group_muRecommendations[1]));
}
model {
// initialize linear predictor term
vector[N] muBuilding = Intercept_muBuilding + Xc * b_muBuilding;
// initialize linear predictor term
vector[N] muCaregiver = Intercept_muCaregiver + Xc * b_muCaregiver;
// initialize linear predictor term
vector[N] muPatient = Intercept_muPatient + Xc * b_muPatient;
// initialize linear predictor term
vector[N] muPatientpsych = Intercept_muPatientpsych + Xc * b_muPatientpsych;
// initialize linear predictor term
vector[N] muSDOH = Intercept_muSDOH + Xc * b_muSDOH;
// initialize linear predictor term
vector[N] muCanmeds = Intercept_muCanmeds + Xc * b_muCanmeds;
// initialize linear predictor term
vector[N] muExpertise = Intercept_muExpertise + Xc * b_muExpertise;
// initialize linear predictor term
vector[N] muRecommendations = Intercept_muRecommendations + Xc * b_muRecommendations;
// linear predictor matrix
vector[ncat] mu[N];
for (n in 1:N) {
// add more terms to the linear predictor
muBuilding[n] += r_Condition_muBuilding[J_1[n]] * Z_Condition[n] + r_Group_muBuilding[J_2[n]] * Z_Group[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muCaregiver[n] += r_Condition_muCaregiver[J_1[n]] * Z_Condition[n] + r_Group_muCaregiver[J_2[n]] * Z_Group[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muPatient[n] += r_Condition_muPatient[J_1[n]] * Z_Condition[n] + r_Group_muPatient[J_2[n]] * Z_Group[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muPatientpsych[n] += r_Condition_muPatientpsych[J_1[n]] * Z_Condition[n] + r_Group_muPatientpsych[J_2[n]] * Z_Group[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muSDOH[n] += r_Condition_muSDOH[J_1[n]] * Z_Condition[n] + r_Group_muSDOH[J_2[n]] * Z_Group[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muCanmeds[n] += r_Condition_muCanmeds[J_1[n]] * Z_Condition[n] + r_Group_muCanmeds[J_2[n]] * Z_Group[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muExpertise[n] += r_Condition_muExpertise[J_1[n]] * Z_Condition[n] + r_Group_muExpertise[J_2[n]] * Z_Group[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
muRecommendations[n] += r_Condition_muRecommendations[J_1[n]] * Z_Condition[n] + r_Group_muRecommendations[J_2[n]] * Z_Group[n];
}
for (n in 1:N) {
mu[n] = [0, muBuilding[n], muCaregiver[n], muPatient[n], muPatientpsych[n], muSDOH[n], muCanmeds[n], muExpertise[n], muRecommendations[n]]';
}
// priors including all constants
b_muBuilding ~ normal(0, 1);
Intercept_muBuilding ~ normal(0, 1);
b_muCaregiver ~ normal(0, 1);
Intercept_muCaregiver ~ normal(0, 1);
b_muPatient ~ normal(0, 1);
Intercept_muPatient ~ normal(0, 1);
b_muPatientpsych ~ normal(0, 1);
Intercept_muPatientpsych ~ normal(0, 1);
b_muSDOH ~ normal(0, 1);
Intercept_muSDOH ~ normal(0, 1);
b_muCanmeds ~ normal(0, 1);
Intercept_muCanmeds ~ normal(0, 1);
b_muExpertise ~ normal(0, 1);
Intercept_muExpertise ~ normal(0, 1);
b_muRecommendations ~ normal(0, 1);
Intercept_muRecommendations ~ normal(0, 1);
target += cauchy_lpdf(sd_Condition_muBuilding[1] | 0,2)
- 1 * cauchy_lccdf(0 | 0,2);
target += normal_lpdf(z_Condition_muBuilding[1] | 0, 1);
target += cauchy_lpdf(sd_Group_muBuilding[1] | 0,exp(1))
- 1 * cauchy_lccdf(0 | 0,exp(1));
target += normal_lpdf(z_Group_muBuilding[1] | 0, 1);
target += cauchy_lpdf(sd_Condition_muCaregiver[1] | 0,2)
- 1 * cauchy_lccdf(0 | 0,2);
target += normal_lpdf(z_Condition_muCaregiver[1] | 0, 1);
target += cauchy_lpdf(sd_Group_muCaregiver[1] | 0,exp(1))
- 1 * cauchy_lccdf(0 | 0,exp(1));
target += normal_lpdf(z_Group_muCaregiver[1] | 0, 1);
target += cauchy_lpdf(sd_Group_muPatient[1] | 0,2)
- 1 * cauchy_lccdf(0 | 0,2);
target += normal_lpdf(z_Group_muPatient[1] | 0, 1);
target += cauchy_lpdf(sd_Condition_muPatient[1] | 0,exp(1))
- 1 * cauchy_lccdf(0 | 0,exp(1));
target += normal_lpdf(z_Condition_muPatient[1] | 0, 1);
target += cauchy_lpdf(sd_Condition_muPatientpsych[1] | 0,2)
- 1 * cauchy_lccdf(0 | 0,2);
target += normal_lpdf(z_Condition_muPatientpsych[1] | 0, 1);
target += cauchy_lpdf(sd_Group_muPatientpsych[1] | 0,exp(1))
- 1 * cauchy_lccdf(0 | 0,exp(1));
target += normal_lpdf(z_Group_muPatientpsych[1] | 0, 1);
target += cauchy_lpdf(sd_Condition_muSDOH[1] | 0,2)
- 1 * cauchy_lccdf(0 | 0,2);
target += normal_lpdf(z_Condition_muSDOH[1] | 0, 1);
target += cauchy_lpdf(sd_Group_muSDOH[1] | 0,exp(1))
- 1 * cauchy_lccdf(0 | 0,exp(1));
target += normal_lpdf(z_Group_muSDOH[1] | 0, 1);
target += cauchy_lpdf(sd_Condition_muCanmeds[1] | 0,2)
- 1 * cauchy_lccdf(0 | 0,2);
target += normal_lpdf(z_Condition_muCanmeds[1] | 0, 1);
target += cauchy_lpdf(sd_Group_muCanmeds[1] | 0,exp(1))
- 1 * cauchy_lccdf(0 | 0,exp(1));
target += normal_lpdf(z_Group_muCanmeds[1] | 0, 1);
target += cauchy_lpdf(sd_Condition_muExpertise[1] | 0,2)
- 1 * cauchy_lccdf(0 | 0,2);
target += normal_lpdf(z_Condition_muExpertise[1] | 0, 1);
target += cauchy_lpdf(sd_Group_muExpertise[1] | 0,exp(1))
- 1 * cauchy_lccdf(0 | 0,exp(1));
target += normal_lpdf(z_Group_muExpertise[1] | 0, 1);
target += cauchy_lpdf(sd_Condition_muRecommendations[1] | 0,2)
- 1 * cauchy_lccdf(0 | 0,2);
target += normal_lpdf(z_Condition_muRecommendations[1] | 0, 1);
target += cauchy_lpdf(sd_Group_muRecommendations[1] | 0,exp(1))
- 1 * cauchy_lccdf(0 | 0,exp(1));
target += normal_lpdf(z_Group_muRecommendations[1] | 0, 1);
// likelihood including all constants
if (!prior_only) {
for (n in 1:N) {
target += multinomial_logit_lpmf(Coded[n] | mu[n]);
}
}
}
generated quantities {
// actual population-level intercept
real b_muBuilding_Intercept = Intercept_muBuilding - dot_product(means_X, b_muBuilding);
// actual population-level intercept
real b_muCaregiver_Intercept = Intercept_muCaregiver - dot_product(means_X, b_muCaregiver);
// actual population-level intercept
real b_muPatient_Intercept = Intercept_muPatient - dot_product(means_X, b_muPatient);
// actual population-level intercept
real b_muPatientpsych_Intercept = Intercept_muPatientpsych - dot_product(means_X, b_muPatientpsych);
// actual population-level intercept
real b_muSDOH_Intercept = Intercept_muSDOH - dot_product(means_X, b_muSDOH);
// actual population-level intercept
real b_muCanmeds_Intercept = Intercept_muCanmeds - dot_product(means_X, b_muCanmeds);
// actual population-level intercept
real b_muExpertise_Intercept = Intercept_muExpertise - dot_product(means_X, b_muExpertise);
// actual population-level intercept
real b_muRecommendations_Intercept = Intercept_muRecommendations - dot_product(means_X, b_muRecommendations);
}
``````

Would there a way to vectorize the for loop for mu’s?

``````  for (n in 1:N) {
mu[n] = [0, muBuilding[n], muCaregiver[n], muPatient[n], muPatientpsych[n], muSDOH[n], muCanmeds[n], muExpertise[n], muRecommendations[n]]';
}
``````
1 Like

Great to hear the runtime is coming down!

You can still use the vectorised calculations to avoid some unnecessary looping. There would also be a bit of a speed-up by combining the declaration and accumulation of the linear predictors:

``````  vector[N] muBuilding;
...
muBuilding = Intercept_muBuilding + Xc * b_muBuilding
+ r_Condition_muBuilding[J_1] .* Z_Condition
+ r_Group_muBuilding[J_2] .* Z_Group;
``````

Would there a way to vectorize the for loop for mu’s?

There’s no way to vectorise the `mu`, but since you’re looping over `N` for the `multinomial_logit` anyway you can just create the vector on-the-fly when you call the distribution:

``````      target += multinomial_logit_lpmf(Coded[n] | [0, muBuilding[n], muCaregiver[n], muPatient[n],
muPatientpsych[n], muSDOH[n], muCanmeds[n],
muExpertise[n], muRecommendations[n]]');
``````

Stan also has a built-in standard-normal distribution which is more computationally efficient than `normal(0, 1)`, so you could try that as well:

``````b_muBuilding ~ std_normal();
``````

Don’t forget to change the `normal_lpdf` calls for the `z` parameters as well.

For the cauchys you need to keep the `cauchy_lpdf` calls, but you can optimise with the `cauchy_lccdf` calls. Since there are multiple calls with the same parameters, you can just subtract the appropriate multiple of the density rather than computing the density multiple times. So the priors for the cauchys becomes:

``````  target += cauchy_lpdf(sd_Condition_muBuilding[1] | 0,2);
target += cauchy_lpdf(sd_Group_muBuilding[1] | 0,exp(1));
target += cauchy_lpdf(sd_Condition_muCaregiver[1] | 0,2);
target += cauchy_lpdf(sd_Group_muCaregiver[1] | 0,exp(1));
target += cauchy_lpdf(sd_Group_muPatient[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muPatient[1] | 0,exp(1));
target += cauchy_lpdf(sd_Condition_muPatientpsych[1] | 0,2);
target += cauchy_lpdf(sd_Group_muPatientpsych[1] | 0,exp(1));
target += cauchy_lpdf(sd_Condition_muSDOH[1] | 0,2);
target += cauchy_lpdf(sd_Group_muSDOH[1] | 0,exp(1));
target += cauchy_lpdf(sd_Condition_muCanmeds[1] | 0,2);
target += cauchy_lpdf(sd_Group_muCanmeds[1] | 0,exp(1));
target += cauchy_lpdf(sd_Condition_muExpertise[1] | 0,2);
target += cauchy_lpdf(sd_Group_muExpertise[1] | 0,exp(1));
target += cauchy_lpdf(sd_Condition_muRecommendations[1] | 0,2);
target += cauchy_lpdf(sd_Group_muRecommendations[1] | 0,exp(1));

target -= 8 * cauchy_lccdf(0 | 0,2);
target -= 8 * cauchy_lccdf(0 | 0,exp(1));
``````

Additionally, there are many repeated calls to `exp(1)`, it would be more efficient to calculate it once in the `transformed data` block, and then re-use the calculated value throughout the model:

``````transformed data{
real exp1 = exp(1);
}
``````
1 Like

Also, can you double-check that `M_Condition` is 1? As you have several parameters that are length `M_Condition`, but only use or put priors on the first element. If `M_Condition` is greater than 1, then the other elements will be implicitly receiving a uniform prior, which could trip up the sampling

1 Like

Thank you so much!

Yes, M_Condition and M_Group are always one. It looks like they specify the number of estimated parameters, which is just an intercept for both.

Yes, M_Condition and M_Group are always one. It looks like they specify the number of estimated parameters, which is just an intercept for both.

Great!

Something I also realised, you can move the `cauchy_lccdf` calls to the `transformed_data` block to cut down on more computations as well:

``````transformed data {
real 8_cauchylccdf_0_2 = 8 * cauchy_lccdf(0 | 0,2);
real 8_cauchylccdf_0_exp1 = 8 * cauchy_lccdf(0 | 0,exp1);
}
...
model {
...
target -= 8_cauchylccdf_0_2;
target -= 8_cauchylccdf_0_exp1;
}``````
1 Like

Does

``````  target -= 8_cauchylccdf_0_2;
target -= 8_cauchylccdf_0_exp1;
``````

Go into the `Model` block, or into the `priors` block?

I’ve tried it both an get an error when compiling the model:
`Variable "target" does not exist.`

In the model block

``````  target += -8_cauchylccdf_0_2;
target += -8_cauchylccdf_0_exp1;
``````
1 Like

That did it!

This is so much faster! It’s gone from ~300 seconds for 1000 transitions with 10 leapfrog steps down to ~70!

1 Like

Great to hear!

If the model runs without divergences and the results are consistent, can you post up your final model code? It can be helpful for people that have similar issues in the future.

1 Like

Not quite there in terms of running without divergences, but I think I’m getting close!

Here’s is the final model code:

``````functions {

/* multinomial-logit log-PMF
* Args:
*   y: array of integer response values
*   mu: vector of category logit probabilities
* Returns:
*   a scalar to be added to the log posterior
*/
real multinomial_logit_lpmf(int[] y, vector mu) {
return multinomial_lpmf(y | softmax(mu));
}
}
data {
int<lower=1> N;  // number of observations
int<lower=2> ncat;  // number of categories
int Coded[N, ncat];  // response array
int trials[N];  // number of trials
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
// data for group-level effects of ID 1 (Condition)
int<lower=1> n_Condition;  // number of grouping levels for Condition
int<lower=1> M_Condition;  // number of coefficients per level
int<lower=1> J_1[N];  // grouping indicator per observation
// group-level predictor values for Condition
vector[N] Z_Condition;
// data for group-level effects of ID 2 (Group)
int<lower=1> n_Group;  // number of grouping levels for Group
int<lower=1> M_Group;  // number of coefficients per level
int<lower=1> J_2[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_Group;
int prior_only;  // should the likelihood be ignored?
}
transformed data {
real exp1 = exp(1);
real Condition_cauchy_lccdf_exp1_8 = 8 * cauchy_lccdf(0 | 0,exp1);
real Group_cauchy_lccdf_2_8 = 8 * cauchy_lccdf(0 | 0,2);
int Kc = K - 1;
matrix[N, Kc] Xc;  // centered version of X_muBuilding without an intercept
vector[Kc] means_X;  // column means of X_muBuilding before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b_muBuilding;  // population-level effects
real Intercept_muBuilding;  // temporary intercept for centered predictors
vector[Kc] b_muCaregiver;  // population-level effects
real Intercept_muCaregiver;  // temporary intercept for centered predictors
vector[Kc] b_muPatient;  // population-level effects
real Intercept_muPatient;  // temporary intercept for centered predictors
vector[Kc] b_muPatientpsych;  // population-level effects
real Intercept_muPatientpsych;  // temporary intercept for centered predictors
vector[Kc] b_muSDOH;  // population-level effects
real Intercept_muSDOH;  // temporary intercept for centered predictors
vector[Kc] b_muCanmeds;  // population-level effects
real Intercept_muCanmeds;  // temporary intercept for centered predictors
vector[Kc] b_muExpertise;  // population-level effects
real Intercept_muExpertise;  // temporary intercept for centered predictors
vector[Kc] b_muRecommendations;  // population-level effects
real Intercept_muRecommendations;  // temporary intercept for centered predictors
vector<lower=0>[M_Condition] sd_Condition_muBuilding;  // group-level standard deviations
vector[n_Condition] z_Condition_muBuilding[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muBuilding;  // group-level standard deviations
vector[n_Group] z_Group_muBuilding[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muCaregiver;  // group-level standard deviations
vector[n_Condition] z_Condition_muCaregiver[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muCaregiver;  // group-level standard deviations
vector[n_Group] z_Group_muCaregiver[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muPatient;  // group-level standard deviations
vector[n_Condition] z_Condition_muPatient[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muPatient;  // group-level standard deviations
vector[n_Group] z_Group_muPatient[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muPatientpsych;  // group-level standard deviations
vector[n_Condition] z_Condition_muPatientpsych[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muPatientpsych;  // group-level standard deviations
vector[n_Group] z_Group_muPatientpsych[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muSDOH;  // group-level standard deviations
vector[n_Condition] z_Condition_muSDOH[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muSDOH;  // group-level standard deviations
vector[n_Group] z_Group_muSDOH[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muCanmeds;  // group-level standard deviations
vector[n_Condition] z_Condition_muCanmeds[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muCanmeds;  // group-level standard deviations
vector[n_Group] z_Group_muCanmeds[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muExpertise;  // group-level standard deviations
vector[n_Condition] z_Condition_muExpertise[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muExpertise;  // group-level standard deviations
vector[n_Group] z_Group_muExpertise[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muRecommendations;  // group-level standard deviations
vector[n_Condition] z_Condition_muRecommendations[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muRecommendations;  // group-level standard deviations
vector[n_Group] z_Group_muRecommendations[M_Group];  // standardized group-level effects
}
transformed parameters {
vector[n_Condition] r_Condition_muBuilding;  // actual group-level effects
vector[n_Group] r_Group_muBuilding;  // actual group-level effects
vector[n_Condition] r_Condition_muCaregiver;  // actual group-level effects
vector[n_Group] r_Group_muCaregiver;  // actual group-level effects
vector[n_Condition] r_Condition_muPatient;  // actual group-level effects
vector[n_Group] r_Group_muPatient;  // actual group-level effects
vector[n_Condition] r_Condition_muPatientpsych;  // actual group-level effects
vector[n_Group] r_Group_muPatientpsych;  // actual group-level effects
vector[n_Condition] r_Condition_muSDOH;  // actual group-level effects
vector[n_Group] r_Group_muSDOH;  // actual group-level effects
vector[n_Condition] r_Condition_muCanmeds;  // actual group-level effects
vector[n_Group] r_Group_muCanmeds;  // actual group-level effects
vector[n_Condition] r_Condition_muExpertise;  // actual group-level effects
vector[n_Group] r_Group_muExpertise;  // actual group-level effects
vector[n_Condition] r_Condition_muRecommendations;  // actual group-level effects
vector[n_Group] r_Group_muRecommendations;  // actual group-level effects
r_Condition_muBuilding = (sd_Condition_muBuilding[1] * (z_Condition_muBuilding[1]));
r_Group_muBuilding = (sd_Group_muBuilding[1] * (z_Group_muBuilding[1]));
r_Condition_muCaregiver = (sd_Condition_muCaregiver[1] * (z_Condition_muCaregiver[1]));
r_Group_muCaregiver = (sd_Group_muCaregiver[1] * (z_Group_muCaregiver[1]));
r_Condition_muPatient = (sd_Condition_muPatient[1] * (z_Condition_muPatient[1]));
r_Group_muPatient = (sd_Group_muPatient[1] * (z_Group_muPatient[1]));
r_Condition_muPatientpsych = (sd_Condition_muPatientpsych[1] * (z_Condition_muPatientpsych[1]));
r_Group_muPatientpsych = (sd_Group_muPatientpsych[1] * (z_Group_muPatientpsych[1]));
r_Condition_muSDOH = (sd_Condition_muSDOH[1] * (z_Condition_muSDOH[1]));
r_Group_muSDOH = (sd_Group_muSDOH[1] * (z_Group_muSDOH[1]));
r_Condition_muCanmeds = (sd_Condition_muCanmeds[1] * (z_Condition_muCanmeds[1]));
r_Group_muCanmeds = (sd_Group_muCanmeds[1] * (z_Group_muCanmeds[1]));
r_Condition_muExpertise = (sd_Condition_muExpertise[1] * (z_Condition_muExpertise[1]));
r_Group_muExpertise = (sd_Group_muExpertise[1] * (z_Group_muExpertise[1]));
r_Condition_muRecommendations = (sd_Condition_muRecommendations[1] * (z_Condition_muRecommendations[1]));
r_Group_muRecommendations = (sd_Group_muRecommendations[1] * (z_Group_muRecommendations[1]));
}
model {
// Declare linear predictor terms
vector[N] muBuilding;
vector[N] muCaregiver;
vector[N] muPatient;
vector[N] muPatientpsych;
vector[N] muSDOH;
vector[N] muCanmeds;
vector[N] muExpertise;
vector[N] muRecommendations;
// Accumulate linear predictor terms
muBuilding = Intercept_muBuilding + Xc * b_muBuilding
+ r_Condition_muBuilding[J_1] .* Z_Condition
+ r_Group_muBuilding[J_2] .* Z_Group;
muCaregiver = Intercept_muCaregiver + Xc * b_muCaregiver
+ r_Condition_muCaregiver[J_1] .* Z_Condition
+ r_Group_muCaregiver[J_2] .* Z_Group;
muPatient = Intercept_muPatient + Xc * b_muPatient
+ r_Condition_muPatient[J_1] .* Z_Condition
+ r_Group_muPatient[J_2] .* Z_Group;
muPatientpsych = Intercept_muPatientpsych + Xc * b_muPatientpsych
+ r_Condition_muPatientpsych[J_1] .* Z_Condition
+ r_Group_muPatientpsych[J_2] .* Z_Group;
muSDOH = Intercept_muSDOH + Xc * b_muSDOH
+ r_Condition_muBuilding[J_1] .* Z_Condition
+ r_Group_muSDOH[J_2] .* Z_Group;
muCanmeds = Intercept_muCanmeds + Xc * b_muCanmeds
+ r_Condition_muCanmeds[J_1] .* Z_Condition
+ r_Group_muCanmeds[J_2] .* Z_Group;
muExpertise = Intercept_muExpertise + Xc * b_muExpertise
+ r_Condition_muExpertise[J_1] .* Z_Condition
+ r_Group_muExpertise[J_2] .* Z_Group;
muRecommendations = Intercept_muRecommendations + Xc * b_muRecommendations
+ r_Condition_muRecommendations[J_1] .* Z_Condition
+ r_Group_muRecommendations[J_2] .* Z_Group;
target += -Condition_cauchy_lccdf_exp1_8;
target += -Group_cauchy_lccdf_2_8;
// priors including all constants
b_muBuilding ~ std_normal();
Intercept_muBuilding ~ std_normal();
b_muCaregiver ~ std_normal();
Intercept_muCaregiver ~ std_normal();
b_muPatient ~ std_normal();
Intercept_muPatient ~ std_normal();
b_muPatientpsych ~ std_normal();
Intercept_muPatientpsych ~ std_normal();
b_muSDOH ~ std_normal();
Intercept_muSDOH ~ std_normal();
b_muCanmeds ~ std_normal();
Intercept_muCanmeds ~ std_normal();
b_muExpertise ~ std_normal();
Intercept_muExpertise ~ std_normal();
b_muRecommendations ~ std_normal();
Intercept_muRecommendations ~ std_normal();
target += cauchy_lpdf(sd_Condition_muBuilding[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muBuilding[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muCaregiver[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muCaregiver[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muPatient[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muPatient[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muPatientpsych[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muPatientpsych[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muSDOH[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muSDOH[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muCanmeds[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muCanmeds[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muExpertise[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muExpertise[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muRecommendations[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muRecommendations[1] | 0,2);
target += normal_lpdf(z_Condition_muBuilding[1] | 0, 1);
target += normal_lpdf(z_Group_muBuilding[1] | 0, 1);
target += normal_lpdf(z_Condition_muCaregiver[1] | 0, 1);
target += normal_lpdf(z_Group_muCaregiver[1] | 0, 1);
target += normal_lpdf(z_Group_muPatient[1] | 0, 1);
target += normal_lpdf(z_Condition_muPatient[1] | 0, 1);
target += normal_lpdf(z_Condition_muPatientpsych[1] | 0, 1);
target += normal_lpdf(z_Group_muPatientpsych[1] | 0, 1);
target += normal_lpdf(z_Condition_muSDOH[1] | 0, 1);
target += normal_lpdf(z_Group_muSDOH[1] | 0, 1);
target += normal_lpdf(z_Condition_muCanmeds[1] | 0, 1);
target += normal_lpdf(z_Group_muCanmeds[1] | 0, 1);
target += normal_lpdf(z_Condition_muExpertise[1] | 0, 1);
target += normal_lpdf(z_Group_muExpertise[1] | 0, 1);
target += normal_lpdf(z_Condition_muRecommendations[1] | 0, 1);
target += normal_lpdf(z_Group_muRecommendations[1] | 0, 1);
// likelihood including all constants
target += multinomial_logit_lpmf(Coded[N] | [0, muBuilding[N], muCaregiver[N], muPatient[N],
muPatientpsych[N], muSDOH[N], muCanmeds[N],
muExpertise[N], muRecommendations[N]]');
}
generated quantities {
// actual population-level intercept
real b_muBuilding_Intercept = Intercept_muBuilding - dot_product(means_X, b_muBuilding);
// actual population-level intercept
real b_muCaregiver_Intercept = Intercept_muCaregiver - dot_product(means_X, b_muCaregiver);
// actual population-level intercept
real b_muPatient_Intercept = Intercept_muPatient - dot_product(means_X, b_muPatient);
// actual population-level intercept
real b_muPatientpsych_Intercept = Intercept_muPatientpsych - dot_product(means_X, b_muPatientpsych);
// actual population-level intercept
real b_muSDOH_Intercept = Intercept_muSDOH - dot_product(means_X, b_muSDOH);
// actual population-level intercept
real b_muCanmeds_Intercept = Intercept_muCanmeds - dot_product(means_X, b_muCanmeds);
// actual population-level intercept
real b_muExpertise_Intercept = Intercept_muExpertise - dot_product(means_X, b_muExpertise);
// actual population-level intercept
real b_muRecommendations_Intercept = Intercept_muRecommendations - dot_product(means_X, b_muRecommendations);
}

``````

Here is my script for running the model on the attached data:
Data.csv (319.4 KB)

``````# List of needed packages
Pkgs <- c("tidyverse", "parallel", "rstan")

lapply(Pkgs, require, c = T)

## Set computing options
ncores = detectCores()
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

# rstan options
rstan_options("auto_write" = TRUE)

## Set up data for Stan
Stan_Data <- list(
N = nrow(Data),
Coded = with(
Data,
cbind(
NoCode,
Building,
Caregiver,
Patient,
Patientpsych,
SDOH,
Canmeds,
Expertise,
Recommendations
)
),
trials = Data\$Count,
X = data.frame(
Intercept = rep(1, nrow(Data)),
ConditionIntervention = ifelse(Data\$Condition == "Intervention", 1, 0),
Session_TypeInitial_Homecare = ifelse(Data\$Session_Type == "Initial_Homecare", 1, 0),
Session_TypeInitial_Debrief = ifelse(Data\$Session_Type == "Initial_Debrief", 1, 0),
Session_TypeFollowUp_Homecare = ifelse(Data\$Session_Type == "Follow-up_Homecare", 1, 0),
Session_TypeFollowUp_Debrief = ifelse(Data\$Session_Type == "Follow-up_Debrief", 1, 0),
FacilitatorFacilitator2 = ifelse(Data\$Facilitator == "Facilitator 2", 1, 0),
`ConditionIntervention:Session_TypeInitial_Homecare` = ifelse(
Data\$Condition == "Intervention" &
Data\$Session_Type == "Initial_Homecare",
1,
0
),
`ConditionIntervention:Session_TypeInitial_Debrief` = ifelse(
Data\$Condition == "Intervention" &
Data\$Session_Type == "Initial_Debrief",
1,
0
),
`ConditionIntervention:Session_TypeFollowUp_Homecare` = ifelse(
Data\$Condition == "Intervention" &
Data\$Session_Type == "Follow-up_Homecare",
1,
0
),
`ConditionIntervention:Session_TypeFollowUp_Debrief` = ifelse(
Data\$Condition == "Intervention" &
Data\$Session_Type == "Follow-up_Debrief",
1,
0
),
`Session_TypeInitial_Homecare:FacilitatorFacilitator2` = ifelse(
Data\$Session_Type == "Initial_Homecare" &
Data\$Facilitator == "Facilitator2",
1,
0
),
`Session_TypeInitial_Debrief:FacilitatorFacilitator2` = ifelse(
Data\$Session_Type == "Initial_Debrief" &
Data\$Facilitator == "Facilitator2",
1,
0
),
`Session_TypeInitial_Homecare:FacilitatorFacilitator2` = ifelse(
Data\$Session_Type == "Follow-up_Homecare" &
Data\$Facilitator == "Facilitator2",
1,
0
),
`Session_TypeInitial_Debrief:FacilitatorFacilitator2` = ifelse(
Data\$Session_Type == "Follow-up_Debrief" &
Data\$Facilitator == "Facilitator2",
1,
0
)
),
n_Condition = length(unique(Data\$Condition)),
n_Group = length(unique(Data\$Group)),
M_Condition = 1,
M_Group = 1,
J_1 = as.integer(Data\$Condition), # Condition in each observation
J_2 = as.integer(Data\$Group), # Group in each observation
prior_only = 0
) %>%
prepend(list(
ncat = ncol(.\$Coded),
K = ncol(.\$X),
Z_Condition = rep(1, nrow(.\$X)),
Z_Group = rep(1, nrow(.\$X))
))

# Compile Stan How model
Stan_Model <- stan_model(file = "Stan_Final_Model.stan", # Name of the existing Stan model specification
model_name = "Final Model", # Name of the resulting compiled model
auto_write = rstan_options("auto_write"))

# Sample from model
Stan_Fit <- sampling(Stan_Model,         # Model to sample from
data = Stan_Data,   # named list of data
iter = 5000,            # total number of iterations per chain
warmup = 3500,          # number of warmup iterations per chain
init = 0,               # set intial values to 0
chains = 2,             # number of Markov chains
cores = ncores,         # number of cores to use (1 per chain)
control=list(max_treedepth = 12,
)

``````

The main source of your divergences is probably this part:

``````  target += multinomial_logit_lpmf(Coded[N] | [0, muBuilding[N], muCaregiver[N], muPatient[N],
muPatientpsych[N], muSDOH[N], muCanmeds[N],
muExpertise[N], muRecommendations[N]]');
}
``````

Because `N` is just the total number of individuals, you’re only specifying the `multinomial_logit_lpmf` prior for the very last individual. You’ll need to have this as a loop (which is what I think it used to be?) and that should be a bit nicer with the divergences

1 Like

Here’s the final, final code with the For loops for mu back in:

``````																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																functions {

/* multinomial-logit log-PMF
* Args:
*   y: array of integer response values
*   mu: vector of category logit probabilities
* Returns:
*   a scalar to be added to the log posterior
*/
real multinomial_logit_lpmf(int[] y, vector mu) {
return multinomial_lpmf(y | softmax(mu));
}
}
data {
int<lower=1> N;  // number of observations
int<lower=2> ncat;  // number of categories
int Coded[N, ncat];  // response array
int trials[N];  // number of trials
int<lower=1> K;  // number of population-level effects
matrix[N, K] X;  // population-level design matrix
// data for group-level effects of ID 1 (Condition)
int<lower=1> n_Condition;  // number of grouping levels for Condition
int<lower=1> M_Condition;  // number of coefficients per level
int<lower=1> J_1[N];  // grouping indicator per observation
// group-level predictor values for Condition
vector[N] Z_Condition;
// data for group-level effects of ID 2 (Group)
int<lower=1> n_Group;  // number of grouping levels for Group
int<lower=1> M_Group;  // number of coefficients per level
int<lower=1> J_2[N];  // grouping indicator per observation
// group-level predictor values
vector[N] Z_Group;
int prior_only;  // should the likelihood be ignored?
}
transformed data {
real exp1 = exp(1);
real Condition_cauchy_lccdf_exp1_8 = 8 * cauchy_lccdf(0 | 0,exp1);
real Group_cauchy_lccdf_2_8 = 8 * cauchy_lccdf(0 | 0,2);
int Kc = K - 1;
matrix[N, Kc] Xc;  // centered version of X_muBuilding without an intercept
vector[Kc] means_X;  // column means of X_muBuilding before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b_muBuilding;  // population-level effects
real Intercept_muBuilding;  // temporary intercept for centered predictors
vector[Kc] b_muCaregiver;  // population-level effects
real Intercept_muCaregiver;  // temporary intercept for centered predictors
vector[Kc] b_muPatient;  // population-level effects
real Intercept_muPatient;  // temporary intercept for centered predictors
vector[Kc] b_muPatientpsych;  // population-level effects
real Intercept_muPatientpsych;  // temporary intercept for centered predictors
vector[Kc] b_muSDOH;  // population-level effects
real Intercept_muSDOH;  // temporary intercept for centered predictors
vector[Kc] b_muCanmeds;  // population-level effects
real Intercept_muCanmeds;  // temporary intercept for centered predictors
vector[Kc] b_muExpertise;  // population-level effects
real Intercept_muExpertise;  // temporary intercept for centered predictors
vector[Kc] b_muRecommendations;  // population-level effects
real Intercept_muRecommendations;  // temporary intercept for centered predictors
vector<lower=0>[M_Condition] sd_Condition_muBuilding;  // group-level standard deviations
vector[n_Condition] z_Condition_muBuilding[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muBuilding;  // group-level standard deviations
vector[n_Group] z_Group_muBuilding[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muCaregiver;  // group-level standard deviations
vector[n_Condition] z_Condition_muCaregiver[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muCaregiver;  // group-level standard deviations
vector[n_Group] z_Group_muCaregiver[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muPatient;  // group-level standard deviations
vector[n_Condition] z_Condition_muPatient[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muPatient;  // group-level standard deviations
vector[n_Group] z_Group_muPatient[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muPatientpsych;  // group-level standard deviations
vector[n_Condition] z_Condition_muPatientpsych[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muPatientpsych;  // group-level standard deviations
vector[n_Group] z_Group_muPatientpsych[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muSDOH;  // group-level standard deviations
vector[n_Condition] z_Condition_muSDOH[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muSDOH;  // group-level standard deviations
vector[n_Group] z_Group_muSDOH[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muCanmeds;  // group-level standard deviations
vector[n_Condition] z_Condition_muCanmeds[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muCanmeds;  // group-level standard deviations
vector[n_Group] z_Group_muCanmeds[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muExpertise;  // group-level standard deviations
vector[n_Condition] z_Condition_muExpertise[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muExpertise;  // group-level standard deviations
vector[n_Group] z_Group_muExpertise[M_Group];  // standardized group-level effects
vector<lower=0>[M_Condition] sd_Condition_muRecommendations;  // group-level standard deviations
vector[n_Condition] z_Condition_muRecommendations[M_Condition];  // standardized group-level effects
vector<lower=0>[M_Group] sd_Group_muRecommendations;  // group-level standard deviations
vector[n_Group] z_Group_muRecommendations[M_Group];  // standardized group-level effects
}
transformed parameters {
vector[n_Condition] r_Condition_muBuilding;  // actual group-level effects
vector[n_Group] r_Group_muBuilding;  // actual group-level effects
vector[n_Condition] r_Condition_muCaregiver;  // actual group-level effects
vector[n_Group] r_Group_muCaregiver;  // actual group-level effects
vector[n_Condition] r_Condition_muPatient;  // actual group-level effects
vector[n_Group] r_Group_muPatient;  // actual group-level effects
vector[n_Condition] r_Condition_muPatientpsych;  // actual group-level effects
vector[n_Group] r_Group_muPatientpsych;  // actual group-level effects
vector[n_Condition] r_Condition_muSDOH;  // actual group-level effects
vector[n_Group] r_Group_muSDOH;  // actual group-level effects
vector[n_Condition] r_Condition_muCanmeds;  // actual group-level effects
vector[n_Group] r_Group_muCanmeds;  // actual group-level effects
vector[n_Condition] r_Condition_muExpertise;  // actual group-level effects
vector[n_Group] r_Group_muExpertise;  // actual group-level effects
vector[n_Condition] r_Condition_muRecommendations;  // actual group-level effects
vector[n_Group] r_Group_muRecommendations;  // actual group-level effects
r_Condition_muBuilding = (sd_Condition_muBuilding[1] * (z_Condition_muBuilding[1]));
r_Group_muBuilding = (sd_Group_muBuilding[1] * (z_Group_muBuilding[1]));
r_Condition_muCaregiver = (sd_Condition_muCaregiver[1] * (z_Condition_muCaregiver[1]));
r_Group_muCaregiver = (sd_Group_muCaregiver[1] * (z_Group_muCaregiver[1]));
r_Condition_muPatient = (sd_Condition_muPatient[1] * (z_Condition_muPatient[1]));
r_Group_muPatient = (sd_Group_muPatient[1] * (z_Group_muPatient[1]));
r_Condition_muPatientpsych = (sd_Condition_muPatientpsych[1] * (z_Condition_muPatientpsych[1]));
r_Group_muPatientpsych = (sd_Group_muPatientpsych[1] * (z_Group_muPatientpsych[1]));
r_Condition_muSDOH = (sd_Condition_muSDOH[1] * (z_Condition_muSDOH[1]));
r_Group_muSDOH = (sd_Group_muSDOH[1] * (z_Group_muSDOH[1]));
r_Condition_muCanmeds = (sd_Condition_muCanmeds[1] * (z_Condition_muCanmeds[1]));
r_Group_muCanmeds = (sd_Group_muCanmeds[1] * (z_Group_muCanmeds[1]));
r_Condition_muExpertise = (sd_Condition_muExpertise[1] * (z_Condition_muExpertise[1]));
r_Group_muExpertise = (sd_Group_muExpertise[1] * (z_Group_muExpertise[1]));
r_Condition_muRecommendations = (sd_Condition_muRecommendations[1] * (z_Condition_muRecommendations[1]));
r_Group_muRecommendations = (sd_Group_muRecommendations[1] * (z_Group_muRecommendations[1]));
}
model {
// Declare linear predictor terms
vector[ncat] mu[N];
vector[N] muBuilding;
vector[N] muCaregiver;
vector[N] muPatient;
vector[N] muPatientpsych;
vector[N] muSDOH;
vector[N] muCanmeds;
vector[N] muExpertise;
vector[N] muRecommendations;
// Accumulate linear predictor terms
muBuilding = Intercept_muBuilding + Xc * b_muBuilding
+ r_Condition_muBuilding[J_1] .* Z_Condition
+ r_Group_muBuilding[J_2] .* Z_Group;
muCaregiver = Intercept_muCaregiver + Xc * b_muCaregiver
+ r_Condition_muCaregiver[J_1] .* Z_Condition
+ r_Group_muCaregiver[J_2] .* Z_Group;
muPatient = Intercept_muPatient + Xc * b_muPatient
+ r_Condition_muPatient[J_1] .* Z_Condition
+ r_Group_muPatient[J_2] .* Z_Group;
muPatientpsych = Intercept_muPatientpsych + Xc * b_muPatientpsych
+ r_Condition_muPatientpsych[J_1] .* Z_Condition
+ r_Group_muPatientpsych[J_2] .* Z_Group;
muSDOH = Intercept_muSDOH + Xc * b_muSDOH
+ r_Condition_muBuilding[J_1] .* Z_Condition
+ r_Group_muSDOH[J_2] .* Z_Group;
muCanmeds = Intercept_muCanmeds + Xc * b_muCanmeds
+ r_Condition_muCanmeds[J_1] .* Z_Condition
+ r_Group_muCanmeds[J_2] .* Z_Group;
muExpertise = Intercept_muExpertise + Xc * b_muExpertise
+ r_Condition_muExpertise[J_1] .* Z_Condition
+ r_Group_muExpertise[J_2] .* Z_Group;
muRecommendations = Intercept_muRecommendations + Xc * b_muRecommendations
+ r_Condition_muRecommendations[J_1] .* Z_Condition
+ r_Group_muRecommendations[J_2] .* Z_Group;
target += -Condition_cauchy_lccdf_exp1_8;
target += -Group_cauchy_lccdf_2_8;
for (n in 1:N) {
mu[n] = [0, muBuilding[n], muCaregiver[n], muPatient[n], muPatientpsych[n], muSDOH[n], muCanmeds[n], muExpertise[n], muRecommendations[n]]';
}
// priors including all constants
b_muBuilding ~ std_normal();
Intercept_muBuilding ~ std_normal();
b_muCaregiver ~ std_normal();
Intercept_muCaregiver ~ std_normal();
b_muPatient ~ std_normal();
Intercept_muPatient ~ std_normal();
b_muPatientpsych ~ std_normal();
Intercept_muPatientpsych ~ std_normal();
b_muSDOH ~ std_normal();
Intercept_muSDOH ~ std_normal();
b_muCanmeds ~ std_normal();
Intercept_muCanmeds ~ std_normal();
b_muExpertise ~ std_normal();
Intercept_muExpertise ~ std_normal();
b_muRecommendations ~ std_normal();
Intercept_muRecommendations ~ std_normal();
target += cauchy_lpdf(sd_Condition_muBuilding[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muBuilding[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muCaregiver[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muCaregiver[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muPatient[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muPatient[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muPatientpsych[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muPatientpsych[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muSDOH[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muSDOH[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muCanmeds[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muCanmeds[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muExpertise[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muExpertise[1] | 0,2);
target += cauchy_lpdf(sd_Condition_muRecommendations[1] | 0,exp1);
target += cauchy_lpdf(sd_Group_muRecommendations[1] | 0,2);
target += normal_lpdf(z_Condition_muBuilding[1] | 0, 1);
target += normal_lpdf(z_Group_muBuilding[1] | 0, 1);
target += normal_lpdf(z_Condition_muCaregiver[1] | 0, 1);
target += normal_lpdf(z_Group_muCaregiver[1] | 0, 1);
target += normal_lpdf(z_Group_muPatient[1] | 0, 1);
target += normal_lpdf(z_Condition_muPatient[1] | 0, 1);
target += normal_lpdf(z_Condition_muPatientpsych[1] | 0, 1);
target += normal_lpdf(z_Group_muPatientpsych[1] | 0, 1);
target += normal_lpdf(z_Condition_muSDOH[1] | 0, 1);
target += normal_lpdf(z_Group_muSDOH[1] | 0, 1);
target += normal_lpdf(z_Condition_muCanmeds[1] | 0, 1);
target += normal_lpdf(z_Group_muCanmeds[1] | 0, 1);
target += normal_lpdf(z_Condition_muExpertise[1] | 0, 1);
target += normal_lpdf(z_Group_muExpertise[1] | 0, 1);
target += normal_lpdf(z_Condition_muRecommendations[1] | 0, 1);
target += normal_lpdf(z_Group_muRecommendations[1] | 0, 1);
// likelihood including all constants
if (!prior_only) {
for (n in 1:N) {
target += multinomial_logit_lpmf(Coded[n] | mu[n]);
}
}
}
generated quantities {
// actual population-level intercept
real b_muBuilding_Intercept = Intercept_muBuilding - dot_product(means_X, b_muBuilding);
// actual population-level intercept
real b_muCaregiver_Intercept = Intercept_muCaregiver - dot_product(means_X, b_muCaregiver);
// actual population-level intercept
real b_muPatient_Intercept = Intercept_muPatient - dot_product(means_X, b_muPatient);
// actual population-level intercept
real b_muPatientpsych_Intercept = Intercept_muPatientpsych - dot_product(means_X, b_muPatientpsych);
// actual population-level intercept
real b_muSDOH_Intercept = Intercept_muSDOH - dot_product(means_X, b_muSDOH);
// actual population-level intercept
real b_muCanmeds_Intercept = Intercept_muCanmeds - dot_product(means_X, b_muCanmeds);
// actual population-level intercept
real b_muExpertise_Intercept = Intercept_muExpertise - dot_product(means_X, b_muExpertise);
// actual population-level intercept
real b_muRecommendations_Intercept = Intercept_muRecommendations - dot_product(means_X, b_muRecommendations);
}
``````

The model converges with following in ~12hrs. Much faster than the 9 days it was taking previously!

``````What_Stan_Fit <- sampling(What_Stan_Model,         # Model to sample from
data = What_Stan_Data,   # named list of data
iter = 2000,            # total number of iterations per chain
warmup = 1000,          # number of warmup iterations per chain
init = 0,               # set intial values to 0
chains = 2,             # number of Markov chains
cores = ncores,         # number of cores to use (1 per chain)
control=list(max_treedepth = 14,