Hierarchical Multinomial Model Optimization/Specification

JLC · April 27, 2020, 9:11pm

I am currently trying to analyze some multinomial data (8 possible categorical responses in each row) with a hierarchical model.

The data are generated from 16 groups of people (8 groups in “control”, 8 groups in "intervention) across 5 sessions, led by 2 different facilitators with just under 4000 observations.

The model takes an extremely long time to run. It’s currently been going for 9 days and has finally reached sampling (6000 iterations with 4500 warm up).

I used BRMS to generate the Stan code and am wondering if there are any potential optimizations that mike help speed things up.

The Stan code is:

 /* multinomial-logit log-PMF
   * Args: 
   *   y: array of integer response values
   *   mu: vector of category logit probabilities
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
   real multinomial_logit_lpmf(int[] y, vector mu) {
 return multinomial_lpmf(y | softmax(mu));
   }
}
data {
  int<lower=1> N;  // number of observations
  int<lower=2> ncat;  // number of categories
  int Y[N, ncat];  // response array
  int trials[N];  // number of trials
  int<lower=1> K_muBuilding;  // number of population-level effects
  matrix[N, K_muBuilding] X_muBuilding;  // population-level design matrix
  int<lower=1> K_muCaregiver;  // number of population-level effects
  matrix[N, K_muCaregiver] X_muCaregiver;  // population-level design matrix
  int<lower=1> K_muPatient;  // number of population-level effects
  matrix[N, K_muPatient] X_muPatient;  // population-level design matrix
  int<lower=1> K_muPatientpsych;  // number of population-level effects
  matrix[N, K_muPatientpsych] X_muPatientpsych;  // population-level design matrix
  int<lower=1> K_muSDOH;  // number of population-level effects
  matrix[N, K_muSDOH] X_muSDOH;  // population-level design matrix
  int<lower=1> K_muCanmeds;  // number of population-level effects
  matrix[N, K_muCanmeds] X_muCanmeds;  // population-level design matrix
  int<lower=1> K_muExpertise;  // number of population-level effects
  matrix[N, K_muExpertise] X_muExpertise;  // population-level design matrix
  int<lower=1> K_muRecommendations;  // number of population-level effects
  matrix[N, K_muRecommendations] X_muRecommendations;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_muBuilding_1;
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_muBuilding_1;
  // data for group-level effects of ID 3
  int<lower=1> N_3;  // number of grouping levels
  int<lower=1> M_3;  // number of coefficients per level
  int<lower=1> J_3[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_3_muCaregiver_1;
  // data for group-level effects of ID 4
  int<lower=1> N_4;  // number of grouping levels
  int<lower=1> M_4;  // number of coefficients per level
  int<lower=1> J_4[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_4_muCaregiver_1;
  // data for group-level effects of ID 5
  int<lower=1> N_5;  // number of grouping levels
  int<lower=1> M_5;  // number of coefficients per level
  int<lower=1> J_5[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_5_muPatient_1;
  // data for group-level effects of ID 6
  int<lower=1> N_6;  // number of grouping levels
  int<lower=1> M_6;  // number of coefficients per level
  int<lower=1> J_6[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_6_muPatient_1;
  // data for group-level effects of ID 7
  int<lower=1> N_7;  // number of grouping levels
  int<lower=1> M_7;  // number of coefficients per level
  int<lower=1> J_7[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_7_muPatientpsych_1;
  // data for group-level effects of ID 8
  int<lower=1> N_8;  // number of grouping levels
  int<lower=1> M_8;  // number of coefficients per level
  int<lower=1> J_8[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_8_muPatientpsych_1;
  // data for group-level effects of ID 9
  int<lower=1> N_9;  // number of grouping levels
  int<lower=1> M_9;  // number of coefficients per level
  int<lower=1> J_9[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_9_muSDOH_1;
  // data for group-level effects of ID 10
  int<lower=1> N_10;  // number of grouping levels
  int<lower=1> M_10;  // number of coefficients per level
  int<lower=1> J_10[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_10_muSDOH_1;
  // data for group-level effects of ID 11
  int<lower=1> N_11;  // number of grouping levels
  int<lower=1> M_11;  // number of coefficients per level
  int<lower=1> J_11[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_11_muCanmeds_1;
  // data for group-level effects of ID 12
  int<lower=1> N_12;  // number of grouping levels
  int<lower=1> M_12;  // number of coefficients per level
  int<lower=1> J_12[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_12_muCanmeds_1;
  // data for group-level effects of ID 13
  int<lower=1> N_13;  // number of grouping levels
  int<lower=1> M_13;  // number of coefficients per level
  int<lower=1> J_13[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_13_muExpertise_1;
  // data for group-level effects of ID 14
  int<lower=1> N_14;  // number of grouping levels
  int<lower=1> M_14;  // number of coefficients per level
  int<lower=1> J_14[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_14_muExpertise_1;
  // data for group-level effects of ID 15
  int<lower=1> N_15;  // number of grouping levels
  int<lower=1> M_15;  // number of coefficients per level
  int<lower=1> J_15[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_15_muRecommendations_1;
  // data for group-level effects of ID 16
  int<lower=1> N_16;  // number of grouping levels
  int<lower=1> M_16;  // number of coefficients per level
  int<lower=1> J_16[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_16_muRecommendations_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc_muBuilding = K_muBuilding - 1;
  matrix[N, Kc_muBuilding] Xc_muBuilding;  // centered version of X_muBuilding without an intercept
  vector[Kc_muBuilding] means_X_muBuilding;  // column means of X_muBuilding before centering
  int Kc_muCaregiver = K_muCaregiver - 1;
  matrix[N, Kc_muCaregiver] Xc_muCaregiver;  // centered version of X_muCaregiver without an intercept
  vector[Kc_muCaregiver] means_X_muCaregiver;  // column means of X_muCaregiver before centering
  int Kc_muPatient = K_muPatient - 1;
  matrix[N, Kc_muPatient] Xc_muPatient;  // centered version of X_muPatient without an intercept
  vector[Kc_muPatient] means_X_muPatient;  // column means of X_muPatient before centering
  int Kc_muPatientpsych = K_muPatientpsych - 1;
  matrix[N, Kc_muPatientpsych] Xc_muPatientpsych;  // centered version of X_muPatientpsych without an intercept
  vector[Kc_muPatientpsych] means_X_muPatientpsych;  // column means of X_muPatientpsych before centering
  int Kc_muSDOH = K_muSDOH - 1;
  matrix[N, Kc_muSDOH] Xc_muSDOH;  // centered version of X_muSDOH without an intercept
  vector[Kc_muSDOH] means_X_muSDOH;  // column means of X_muSDOH before centering
  int Kc_muCanmeds = K_muCanmeds - 1;
  matrix[N, Kc_muCanmeds] Xc_muCanmeds;  // centered version of X_muCanmeds without an intercept
  vector[Kc_muCanmeds] means_X_muCanmeds;  // column means of X_muCanmeds before centering
  int Kc_muExpertise = K_muExpertise - 1;
  matrix[N, Kc_muExpertise] Xc_muExpertise;  // centered version of X_muExpertise without an intercept
  vector[Kc_muExpertise] means_X_muExpertise;  // column means of X_muExpertise before centering
  int Kc_muRecommendations = K_muRecommendations - 1;
  matrix[N, Kc_muRecommendations] Xc_muRecommendations;  // centered version of X_muRecommendations without an intercept
  vector[Kc_muRecommendations] means_X_muRecommendations;  // column means of X_muRecommendations before centering
  for (i in 2:K_muBuilding) {
means_X_muBuilding[i - 1] = mean(X_muBuilding[, i]);
Xc_muBuilding[, i - 1] = X_muBuilding[, i] - means_X_muBuilding[i - 1];
  }
  for (i in 2:K_muCaregiver) {
means_X_muCaregiver[i - 1] = mean(X_muCaregiver[, i]);
Xc_muCaregiver[, i - 1] = X_muCaregiver[, i] - means_X_muCaregiver[i - 1];
  }
  for (i in 2:K_muPatient) {
means_X_muPatient[i - 1] = mean(X_muPatient[, i]);
Xc_muPatient[, i - 1] = X_muPatient[, i] - means_X_muPatient[i - 1];
  }
  for (i in 2:K_muPatientpsych) {
means_X_muPatientpsych[i - 1] = mean(X_muPatientpsych[, i]);
Xc_muPatientpsych[, i - 1] = X_muPatientpsych[, i] - means_X_muPatientpsych[i - 1];
  }
  for (i in 2:K_muSDOH) {
means_X_muSDOH[i - 1] = mean(X_muSDOH[, i]);
Xc_muSDOH[, i - 1] = X_muSDOH[, i] - means_X_muSDOH[i - 1];
  }
  for (i in 2:K_muCanmeds) {
means_X_muCanmeds[i - 1] = mean(X_muCanmeds[, i]);
Xc_muCanmeds[, i - 1] = X_muCanmeds[, i] - means_X_muCanmeds[i - 1];
  }
  for (i in 2:K_muExpertise) {
means_X_muExpertise[i - 1] = mean(X_muExpertise[, i]);
Xc_muExpertise[, i - 1] = X_muExpertise[, i] - means_X_muExpertise[i - 1];
  }
  for (i in 2:K_muRecommendations) {
means_X_muRecommendations[i - 1] = mean(X_muRecommendations[, i]);
Xc_muRecommendations[, i - 1] = X_muRecommendations[, i] - means_X_muRecommendations[i - 1];
  }
}
parameters {
  vector[Kc_muBuilding] b_muBuilding;  // population-level effects
  real Intercept_muBuilding;  // temporary intercept for centered predictors
  vector[Kc_muCaregiver] b_muCaregiver;  // population-level effects
  real Intercept_muCaregiver;  // temporary intercept for centered predictors
  vector[Kc_muPatient] b_muPatient;  // population-level effects
  real Intercept_muPatient;  // temporary intercept for centered predictors
  vector[Kc_muPatientpsych] b_muPatientpsych;  // population-level effects
  real Intercept_muPatientpsych;  // temporary intercept for centered predictors
  vector[Kc_muSDOH] b_muSDOH;  // population-level effects
  real Intercept_muSDOH;  // temporary intercept for centered predictors
  vector[Kc_muCanmeds] b_muCanmeds;  // population-level effects
  real Intercept_muCanmeds;  // temporary intercept for centered predictors
  vector[Kc_muExpertise] b_muExpertise;  // population-level effects
  real Intercept_muExpertise;  // temporary intercept for centered predictors
  vector[Kc_muRecommendations] b_muRecommendations;  // population-level effects
  real Intercept_muRecommendations;  // temporary intercept for centered predictors
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // standardized group-level effects
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  vector[N_2] z_2[M_2];  // standardized group-level effects
  vector<lower=0>[M_3] sd_3;  // group-level standard deviations
  vector[N_3] z_3[M_3];  // standardized group-level effects
  vector<lower=0>[M_4] sd_4;  // group-level standard deviations
  vector[N_4] z_4[M_4];  // standardized group-level effects
  vector<lower=0>[M_5] sd_5;  // group-level standard deviations
  vector[N_5] z_5[M_5];  // standardized group-level effects
  vector<lower=0>[M_6] sd_6;  // group-level standard deviations
  vector[N_6] z_6[M_6];  // standardized group-level effects
  vector<lower=0>[M_7] sd_7;  // group-level standard deviations
  vector[N_7] z_7[M_7];  // standardized group-level effects
  vector<lower=0>[M_8] sd_8;  // group-level standard deviations
  vector[N_8] z_8[M_8];  // standardized group-level effects
  vector<lower=0>[M_9] sd_9;  // group-level standard deviations
  vector[N_9] z_9[M_9];  // standardized group-level effects
  vector<lower=0>[M_10] sd_10;  // group-level standard deviations
  vector[N_10] z_10[M_10];  // standardized group-level effects
  vector<lower=0>[M_11] sd_11;  // group-level standard deviations
  vector[N_11] z_11[M_11];  // standardized group-level effects
  vector<lower=0>[M_12] sd_12;  // group-level standard deviations
  vector[N_12] z_12[M_12];  // standardized group-level effects
  vector<lower=0>[M_13] sd_13;  // group-level standard deviations
  vector[N_13] z_13[M_13];  // standardized group-level effects
  vector<lower=0>[M_14] sd_14;  // group-level standard deviations
  vector[N_14] z_14[M_14];  // standardized group-level effects
  vector<lower=0>[M_15] sd_15;  // group-level standard deviations
  vector[N_15] z_15[M_15];  // standardized group-level effects
  vector<lower=0>[M_16] sd_16;  // group-level standard deviations
  vector[N_16] z_16[M_16];  // standardized group-level effects
}
transformed parameters {
  vector[N_1] r_1_muBuilding_1;  // actual group-level effects
  vector[N_2] r_2_muBuilding_1;  // actual group-level effects
  vector[N_3] r_3_muCaregiver_1;  // actual group-level effects
  vector[N_4] r_4_muCaregiver_1;  // actual group-level effects
  vector[N_5] r_5_muPatient_1;  // actual group-level effects
  vector[N_6] r_6_muPatient_1;  // actual group-level effects
  vector[N_7] r_7_muPatientpsych_1;  // actual group-level effects
  vector[N_8] r_8_muPatientpsych_1;  // actual group-level effects
  vector[N_9] r_9_muSDOH_1;  // actual group-level effects
  vector[N_10] r_10_muSDOH_1;  // actual group-level effects
  vector[N_11] r_11_muCanmeds_1;  // actual group-level effects
  vector[N_12] r_12_muCanmeds_1;  // actual group-level effects
  vector[N_13] r_13_muExpertise_1;  // actual group-level effects
  vector[N_14] r_14_muExpertise_1;  // actual group-level effects
  vector[N_15] r_15_muRecommendations_1;  // actual group-level effects
  vector[N_16] r_16_muRecommendations_1;  // actual group-level effects
  r_1_muBuilding_1 = (sd_1[1] * (z_1[1]));
  r_2_muBuilding_1 = (sd_2[1] * (z_2[1]));
  r_3_muCaregiver_1 = (sd_3[1] * (z_3[1]));
  r_4_muCaregiver_1 = (sd_4[1] * (z_4[1]));
  r_5_muPatient_1 = (sd_5[1] * (z_5[1]));
  r_6_muPatient_1 = (sd_6[1] * (z_6[1]));
  r_7_muPatientpsych_1 = (sd_7[1] * (z_7[1]));
  r_8_muPatientpsych_1 = (sd_8[1] * (z_8[1]));
  r_9_muSDOH_1 = (sd_9[1] * (z_9[1]));
  r_10_muSDOH_1 = (sd_10[1] * (z_10[1]));
  r_11_muCanmeds_1 = (sd_11[1] * (z_11[1]));
  r_12_muCanmeds_1 = (sd_12[1] * (z_12[1]));
  r_13_muExpertise_1 = (sd_13[1] * (z_13[1]));
  r_14_muExpertise_1 = (sd_14[1] * (z_14[1]));
  r_15_muRecommendations_1 = (sd_15[1] * (z_15[1]));
  r_16_muRecommendations_1 = (sd_16[1] * (z_16[1]));
}
model {
  // initialize linear predictor term
  vector[N] muBuilding = Intercept_muBuilding + Xc_muBuilding * b_muBuilding;
  // initialize linear predictor term
  vector[N] muCaregiver = Intercept_muCaregiver + Xc_muCaregiver * b_muCaregiver;
  // initialize linear predictor term
  vector[N] muPatient = Intercept_muPatient + Xc_muPatient * b_muPatient;
  // initialize linear predictor term
  vector[N] muPatientpsych = Intercept_muPatientpsych + Xc_muPatientpsych * b_muPatientpsych;
  // initialize linear predictor term
  vector[N] muSDOH = Intercept_muSDOH + Xc_muSDOH * b_muSDOH;
  // initialize linear predictor term
  vector[N] muCanmeds = Intercept_muCanmeds + Xc_muCanmeds * b_muCanmeds;
  // initialize linear predictor term
  vector[N] muExpertise = Intercept_muExpertise + Xc_muExpertise * b_muExpertise;
  // initialize linear predictor term
  vector[N] muRecommendations = Intercept_muRecommendations + Xc_muRecommendations * b_muRecommendations;
  // linear predictor matrix
  vector[ncat] mu[N];
  for (n in 1:N) {
// add more terms to the linear predictor
muBuilding[n] += r_1_muBuilding_1[J_1[n]] * Z_1_muBuilding_1[n] + r_2_muBuilding_1[J_2[n]] * Z_2_muBuilding_1[n];
  }
  for (n in 1:N) {
// add more terms to the linear predictor
muCaregiver[n] += r_3_muCaregiver_1[J_3[n]] * Z_3_muCaregiver_1[n] + r_4_muCaregiver_1[J_4[n]] * Z_4_muCaregiver_1[n];
  }
  for (n in 1:N) {
// add more terms to the linear predictor
muPatient[n] += r_5_muPatient_1[J_5[n]] * Z_5_muPatient_1[n] + r_6_muPatient_1[J_6[n]] * Z_6_muPatient_1[n];
  }
  for (n in 1:N) {
// add more terms to the linear predictor
muPatientpsych[n] += r_7_muPatientpsych_1[J_7[n]] * Z_7_muPatientpsych_1[n] + r_8_muPatientpsych_1[J_8[n]] * Z_8_muPatientpsych_1[n];
  }
  for (n in 1:N) {
// add more terms to the linear predictor
muSDOH[n] += r_9_muSDOH_1[J_9[n]] * Z_9_muSDOH_1[n] + r_10_muSDOH_1[J_10[n]] * Z_10_muSDOH_1[n];
  }
  for (n in 1:N) {
// add more terms to the linear predictor
muCanmeds[n] += r_11_muCanmeds_1[J_11[n]] * Z_11_muCanmeds_1[n] + r_12_muCanmeds_1[J_12[n]] * Z_12_muCanmeds_1[n];
  }
  for (n in 1:N) {
// add more terms to the linear predictor
muExpertise[n] += r_13_muExpertise_1[J_13[n]] * Z_13_muExpertise_1[n] + r_14_muExpertise_1[J_14[n]] * Z_14_muExpertise_1[n];
  }
  for (n in 1:N) {
// add more terms to the linear predictor
muRecommendations[n] += r_15_muRecommendations_1[J_15[n]] * Z_15_muRecommendations_1[n] + r_16_muRecommendations_1[J_16[n]] * Z_16_muRecommendations_1[n];
  }
  for (n in 1:N) {
mu[n] = [0, muBuilding[n], muCaregiver[n], muPatient[n], muPatientpsych[n], muSDOH[n], muCanmeds[n], muExpertise[n], muRecommendations[n]]';
  }
  // priors including all constants
  target += normal_lpdf(b_muBuilding[1] | 0,5);
  target += normal_lpdf(b_muBuilding[2] | 0,5);
  target += normal_lpdf(b_muBuilding[3] | 0,5);
  target += normal_lpdf(b_muBuilding[4] | 0,5);
  target += normal_lpdf(b_muBuilding[5] | 0,5);
  target += normal_lpdf(b_muBuilding[6] | 0,5);
  target += normal_lpdf(b_muBuilding[7] | 0,5);
  target += normal_lpdf(b_muBuilding[8] | 0,5);
  target += normal_lpdf(b_muBuilding[9] | 0,5);
  target += normal_lpdf(b_muBuilding[10] | 0,5);
  target += normal_lpdf(b_muBuilding[11] | 0,5);
  target += normal_lpdf(b_muBuilding[12] | 0,5);
  target += normal_lpdf(b_muBuilding[13] | 0,5);
  target += normal_lpdf(b_muBuilding[14] | 0,5);
  target += normal_lpdf(b_muCaregiver[1] | 0,5);
  target += normal_lpdf(b_muCaregiver[2] | 0,5);
  target += normal_lpdf(b_muCaregiver[3] | 0,5);
  target += normal_lpdf(b_muCaregiver[4] | 0,5);
  target += normal_lpdf(b_muCaregiver[5] | 0,5);
  target += normal_lpdf(b_muCaregiver[6] | 0,5);
  target += normal_lpdf(b_muCaregiver[7] | 0,5);
  target += normal_lpdf(b_muCaregiver[8] | 0,5);
  target += normal_lpdf(b_muCaregiver[9] | 0,5);
  target += normal_lpdf(b_muCaregiver[10] | 0,5);
  target += normal_lpdf(b_muCaregiver[11] | 0,5);
  target += normal_lpdf(b_muCaregiver[12] | 0,5);
  target += normal_lpdf(b_muCaregiver[13] | 0,5);
  target += normal_lpdf(b_muCaregiver[14] | 0,5);
  target += normal_lpdf(b_muPatient[1] | 0,5);
  target += normal_lpdf(b_muPatient[2] | 0,5);
  target += normal_lpdf(b_muPatient[3] | 0,5);
  target += normal_lpdf(b_muPatient[4] | 0,5);
  target += normal_lpdf(b_muPatient[5] | 0,5);
  target += normal_lpdf(b_muPatient[6] | 0,5);
  target += normal_lpdf(b_muPatient[7] | 0,5);
  target += normal_lpdf(b_muPatient[8] | 0,5);
  target += normal_lpdf(b_muPatient[9] | 0,5);
  target += normal_lpdf(b_muPatient[10] | 0,5);
  target += normal_lpdf(b_muPatient[11] | 0,5);
  target += normal_lpdf(b_muPatient[12] | 0,5);
  target += normal_lpdf(b_muPatient[13] | 0,5);
  target += normal_lpdf(b_muPatient[14] | 0,5);
  target += normal_lpdf(b_muPatientpsych[1] | 0,5);
  target += normal_lpdf(b_muPatientpsych[2] | 0,5);
  target += normal_lpdf(b_muPatientpsych[3] | 0,5);
  target += normal_lpdf(b_muPatientpsych[4] | 0,5);
  target += normal_lpdf(b_muPatientpsych[5] | 0,5);
  target += normal_lpdf(b_muPatientpsych[6] | 0,5);
  target += normal_lpdf(b_muPatientpsych[7] | 0,5);
  target += normal_lpdf(b_muPatientpsych[8] | 0,5);
  target += normal_lpdf(b_muPatientpsych[9] | 0,5);
  target += normal_lpdf(b_muPatientpsych[10] | 0,5);
  target += normal_lpdf(b_muPatientpsych[11] | 0,5);
  target += normal_lpdf(b_muPatientpsych[12] | 0,5);
  target += normal_lpdf(b_muPatientpsych[13] | 0,5);
  target += normal_lpdf(b_muPatientpsych[14] | 0,5);
  target += normal_lpdf(b_muSDOH[1] | 0,5);
  target += normal_lpdf(b_muSDOH[2] | 0,5);
  target += normal_lpdf(b_muSDOH[3] | 0,5);
  target += normal_lpdf(b_muSDOH[4] | 0,5);
  target += normal_lpdf(b_muSDOH[5] | 0,5);
  target += normal_lpdf(b_muSDOH[6] | 0,5);
  target += normal_lpdf(b_muSDOH[7] | 0,5);
  target += normal_lpdf(b_muSDOH[8] | 0,5);
  target += normal_lpdf(b_muSDOH[9] | 0,5);
  target += normal_lpdf(b_muSDOH[10] | 0,5);
  target += normal_lpdf(b_muSDOH[11] | 0,5);
  target += normal_lpdf(b_muSDOH[12] | 0,5);
  target += normal_lpdf(b_muSDOH[13] | 0,5);
  target += normal_lpdf(b_muSDOH[14] | 0,5);
  target += normal_lpdf(b_muCanmeds[1] | 0,5);
  target += normal_lpdf(b_muCanmeds[2] | 0,5);
  target += normal_lpdf(b_muCanmeds[3] | 0,5);
  target += normal_lpdf(b_muCanmeds[4] | 0,5);
  target += normal_lpdf(b_muCanmeds[5] | 0,5);
  target += normal_lpdf(b_muCanmeds[6] | 0,5);
  target += normal_lpdf(b_muCanmeds[7] | 0,5);
  target += normal_lpdf(b_muCanmeds[8] | 0,5);
  target += normal_lpdf(b_muCanmeds[9] | 0,5);
  target += normal_lpdf(b_muCanmeds[10] | 0,5);
  target += normal_lpdf(b_muCanmeds[11] | 0,5);
  target += normal_lpdf(b_muCanmeds[12] | 0,5);
  target += normal_lpdf(b_muCanmeds[13] | 0,5);
  target += normal_lpdf(b_muCanmeds[14] | 0,5);
  target += normal_lpdf(b_muExpertise[1] | 0,5);
  target += normal_lpdf(b_muExpertise[2] | 0,5);
  target += normal_lpdf(b_muExpertise[3] | 0,5);
  target += normal_lpdf(b_muExpertise[4] | 0,5);
  target += normal_lpdf(b_muExpertise[5] | 0,5);
  target += normal_lpdf(b_muExpertise[6] | 0,5);
  target += normal_lpdf(b_muExpertise[7] | 0,5);
  target += normal_lpdf(b_muExpertise[8] | 0,5);
  target += normal_lpdf(b_muExpertise[9] | 0,5);
  target += normal_lpdf(b_muExpertise[10] | 0,5);
  target += normal_lpdf(b_muExpertise[11] | 0,5);
  target += normal_lpdf(b_muExpertise[12] | 0,5);
  target += normal_lpdf(b_muExpertise[13] | 0,5);
  target += normal_lpdf(b_muExpertise[14] | 0,5);
  target += normal_lpdf(b_muRecommendations[1] | 0,5);
  target += normal_lpdf(b_muRecommendations[2] | 0,5);
  target += normal_lpdf(b_muRecommendations[3] | 0,5);
  target += normal_lpdf(b_muRecommendations[4] | 0,5);
  target += normal_lpdf(b_muRecommendations[5] | 0,5);
  target += normal_lpdf(b_muRecommendations[6] | 0,5);
  target += normal_lpdf(b_muRecommendations[7] | 0,5);
  target += normal_lpdf(b_muRecommendations[8] | 0,5);
  target += normal_lpdf(b_muRecommendations[9] | 0,5);
  target += normal_lpdf(b_muRecommendations[10] | 0,5);
  target += normal_lpdf(b_muRecommendations[11] | 0,5);
  target += normal_lpdf(b_muRecommendations[12] | 0,5);
  target += normal_lpdf(b_muRecommendations[13] | 0,5);
  target += normal_lpdf(b_muRecommendations[14] | 0,5);
  target += cauchy_lpdf(sd_1[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_1[1] | 0, 1);
  target += cauchy_lpdf(sd_2[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_2[1] | 0, 1);
  target += cauchy_lpdf(sd_3[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_3[1] | 0, 1);
  target += cauchy_lpdf(sd_4[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_4[1] | 0, 1);
  target += cauchy_lpdf(sd_5[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_5[1] | 0, 1);
  target += cauchy_lpdf(sd_6[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_6[1] | 0, 1);
  target += cauchy_lpdf(sd_7[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_7[1] | 0, 1);
  target += cauchy_lpdf(sd_8[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_8[1] | 0, 1);
  target += cauchy_lpdf(sd_9[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_9[1] | 0, 1);
  target += cauchy_lpdf(sd_10[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_10[1] | 0, 1);
  target += cauchy_lpdf(sd_11[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_11[1] | 0, 1);
  target += cauchy_lpdf(sd_12[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_12[1] | 0, 1);
  target += cauchy_lpdf(sd_13[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_13[1] | 0, 1);
  target += cauchy_lpdf(sd_14[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_14[1] | 0, 1);
  target += cauchy_lpdf(sd_15[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_15[1] | 0, 1);
  target += cauchy_lpdf(sd_16[1] | 0,1)
- 1 * cauchy_lccdf(0 | 0,1);
  target += normal_lpdf(z_16[1] | 0, 1);
  // likelihood including all constants
  if (!prior_only) {
for (n in 1:N) {
  target += multinomial_logit_lpmf(Y[n] | mu[n]);
}
  }
}
generated quantities {
  // actual population-level intercept
  real b_muBuilding_Intercept = Intercept_muBuilding - dot_product(means_X_muBuilding, b_muBuilding);
  // actual population-level intercept
  real b_muCaregiver_Intercept = Intercept_muCaregiver - dot_product(means_X_muCaregiver, b_muCaregiver);
  // actual population-level intercept
  real b_muPatient_Intercept = Intercept_muPatient - dot_product(means_X_muPatient, b_muPatient);
  // actual population-level intercept
  real b_muPatientpsych_Intercept = Intercept_muPatientpsych - dot_product(means_X_muPatientpsych, b_muPatientpsych);
  // actual population-level intercept
  real b_muSDOH_Intercept = Intercept_muSDOH - dot_product(means_X_muSDOH, b_muSDOH);
  // actual population-level intercept
  real b_muCanmeds_Intercept = Intercept_muCanmeds - dot_product(means_X_muCanmeds, b_muCanmeds);
  // actual population-level intercept
  real b_muExpertise_Intercept = Intercept_muExpertise - dot_product(means_X_muExpertise, b_muExpertise);
  // actual population-level intercept
  real b_muRecommendations_Intercept = Intercept_muRecommendations - dot_product(means_X_muRecommendations, b_muRecommendations);
}

I have uploaded a .csv of the “large list” of the data.

Anything that might help speed this up would be great. Waiting a week to then find divergent transitions or inadequate tree depth is a source of frustration.

stijn · April 28, 2020, 3:23am

Paul Bürkner has built a lot of standard optimisations in brms, so I am not even going to presume to be able to do better than his implementation.

I think from just scanning the priors, the model has a lot of very broad priors which gives counter-intuitive prior distributions on the logit scale. Basically normal(0,5), if the predictors are on the unit scale, and folded cauchy priors are indicating that you expect the model to be relatively sure that a category will be chosen or not chosen. My advice would be to start with normal(0, 1) for the coefficients and exp(1) for hierarchical scales and do a prior predictive check with brms. I think you can do prior_only = TRUE in brms. You probably don’t need 4500 warmup if the model is appropriate for your data. Once you have the prior right, you can maybe start with warmup = 1000 and iter = 2000 to see whether you have any speed-ups.

The way I think about it is that if you can improve the prior with 1 day of work and it shaves off 2 days of computations that’s a win.

EDIT: Also have a look at this post that focuses on avoiding redundant computations. This might help as well.

JLC · April 28, 2020, 1:31pm

Thank you very much!

Do you think the sd priors should still be specified as half-cauchy (0, exp(1)), or would half-normal likely be better?

andrjohns · April 28, 2020, 2:04pm

Hi JLC,

There are three things that I’d try here. The first is replacing the loops with vectorised calculations. So rather than:

  for (n in 1:N) {
muBuilding[n] += r_1_muBuilding_1[J_1[n]] * Z_1_muBuilding_1[n] + r_2_muBuilding_1[J_2[n]] * Z_2_muBuilding_1[n];
  }

You could have:

muBuilding += r_1_muBuilding_1[J_1] .* Z_1_muBuilding_1 + r_2_muBuilding_1[J_2] .* Z_2_muBuilding_1;

Secondly, unless you need the distributions with normalising constants (e.g., if you’re going to be calculating marginal likelihoods or bayes factors), you should replace the target += syntax with the ~ syntax. This can cut out some computations on the back-end that you don’t need. For example, rather than:

target += normal_lpdf(b_muBuilding[1] | 0,5);

You would say:

b_muBuilding[1] ~ normal(0, 5);

Finally, you could also take advantage of the vectorised prior distributions. So rather than:

b_muBuilding[1] ~ normal(0, 5);
...
b_muBuilding[14] ~ normal(0, 5);

You can just specify:

b_muBuilding ~ normal(0, 5);

I don’t think you can use the vectorisation or ~ syntax for the Cauchy priors though.

Hope some of that helps!
Andrew

JLC · April 30, 2020, 5:33pm

Thank you @stijn !

In running running the model on only priors, I found I hadn’t put priors on category intercepts. Adding those and using normal(0,1) on betas and cauchy(0, exp(1)) on sd’s in brms sped things up a lot! It finished in a day with no divergent transitions 3000 iterations (1500 warmup).

Unfortunately, brms doesn’t have a built-in pp_check for multinomial models.

@andrjohns

Thanks for the recommendations!

I found what I think we’re a number of redundant estimates and added in your suggestions:

functions {

  /* multinomial-logit log-PMF
   * Args: 
   *   y: array of integer response values
   *   mu: vector of category logit probabilities
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
   real multinomial_logit_lpmf(int[] y, vector mu) {
     return multinomial_lpmf(y | softmax(mu));
   }
}
data {
  int<lower=1> N;  // number of observations
  int<lower=2> ncat;  // number of categories
  int Coded[N, ncat];  // response array
  int trials[N];  // number of trials
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1 (Condition)
  int<lower=1> n_Condition;  // number of grouping levels for Condition
  int<lower=1> M_Condition;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values for Condition
  vector[N] Z_Condition;
  // data for group-level effects of ID 2 (Group)
  int<lower=1> n_Group;  // number of grouping levels for Group
  int<lower=1> M_Group;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_Group;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X_muBuilding without an intercept
  vector[Kc] means_X;  // column means of X_muBuilding before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b_muBuilding;  // population-level effects
  real Intercept_muBuilding;  // temporary intercept for centered predictors
  vector[Kc] b_muCaregiver;  // population-level effects
  real Intercept_muCaregiver;  // temporary intercept for centered predictors
  vector[Kc] b_muPatient;  // population-level effects
  real Intercept_muPatient;  // temporary intercept for centered predictors
  vector[Kc] b_muPatientpsych;  // population-level effects
  real Intercept_muPatientpsych;  // temporary intercept for centered predictors
  vector[Kc] b_muSDOH;  // population-level effects
  real Intercept_muSDOH;  // temporary intercept for centered predictors
  vector[Kc] b_muCanmeds;  // population-level effects
  real Intercept_muCanmeds;  // temporary intercept for centered predictors
  vector[Kc] b_muExpertise;  // population-level effects
  real Intercept_muExpertise;  // temporary intercept for centered predictors
  vector[Kc] b_muRecommendations;  // population-level effects
  real Intercept_muRecommendations;  // temporary intercept for centered predictors
   vector<lower=0>[M_Condition] sd_Condition_muBuilding;  // group-level standard deviations
  vector[n_Condition] z_Condition_muBuilding[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muBuilding;  // group-level standard deviations
  vector[n_Group] z_Group_muBuilding[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muCaregiver;  // group-level standard deviations
  vector[n_Condition] z_Condition_muCaregiver[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muCaregiver;  // group-level standard deviations
  vector[n_Group] z_Group_muCaregiver[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muPatient;  // group-level standard deviations
  vector[n_Condition] z_Condition_muPatient[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muPatient;  // group-level standard deviations
  vector[n_Group] z_Group_muPatient[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muPatientpsych;  // group-level standard deviations
  vector[n_Condition] z_Condition_muPatientpsych[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muPatientpsych;  // group-level standard deviations
  vector[n_Group] z_Group_muPatientpsych[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muSDOH;  // group-level standard deviations
  vector[n_Condition] z_Condition_muSDOH[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muSDOH;  // group-level standard deviations
  vector[n_Group] z_Group_muSDOH[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muCanmeds;  // group-level standard deviations
  vector[n_Condition] z_Condition_muCanmeds[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muCanmeds;  // group-level standard deviations
  vector[n_Group] z_Group_muCanmeds[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muExpertise;  // group-level standard deviations
  vector[n_Condition] z_Condition_muExpertise[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muExpertise;  // group-level standard deviations
  vector[n_Group] z_Group_muExpertise[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muRecommendations;  // group-level standard deviations
  vector[n_Condition] z_Condition_muRecommendations[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muRecommendations;  // group-level standard deviations
  vector[n_Group] z_Group_muRecommendations[M_Group];  // standardized group-level effects
}
transformed parameters {
  vector[n_Condition] r_Condition_muBuilding;  // actual group-level effects
  vector[n_Group] r_Group_muBuilding;  // actual group-level effects
  vector[n_Condition] r_Condition_muCaregiver;  // actual group-level effects
  vector[n_Group] r_Group_muCaregiver;  // actual group-level effects
  vector[n_Condition] r_Condition_muPatient;  // actual group-level effects
  vector[n_Group] r_Group_muPatient;  // actual group-level effects
  vector[n_Condition] r_Condition_muPatientpsych;  // actual group-level effects
  vector[n_Group] r_Group_muPatientpsych;  // actual group-level effects
  vector[n_Condition] r_Condition_muSDOH;  // actual group-level effects
  vector[n_Group] r_Group_muSDOH;  // actual group-level effects
  vector[n_Condition] r_Condition_muCanmeds;  // actual group-level effects
  vector[n_Group] r_Group_muCanmeds;  // actual group-level effects
  vector[n_Condition] r_Condition_muExpertise;  // actual group-level effects
  vector[n_Group] r_Group_muExpertise;  // actual group-level effects
  vector[n_Condition] r_Condition_muRecommendations;  // actual group-level effects
  vector[n_Group] r_Group_muRecommendations;  // actual group-level effects
  r_Condition_muBuilding = (sd_Condition_muBuilding[1] * (z_Condition_muBuilding[1]));
  r_Group_muBuilding = (sd_Group_muBuilding[1] * (z_Group_muBuilding[1]));
  r_Condition_muCaregiver = (sd_Condition_muCaregiver[1] * (z_Condition_muCaregiver[1]));
  r_Group_muCaregiver = (sd_Group_muCaregiver[1] * (z_Group_muCaregiver[1]));
  r_Condition_muPatient = (sd_Condition_muPatient[1] * (z_Condition_muPatient[1]));
  r_Group_muPatient = (sd_Group_muPatient[1] * (z_Group_muPatient[1]));
  r_Condition_muPatientpsych = (sd_Condition_muPatientpsych[1] * (z_Condition_muPatientpsych[1]));
  r_Group_muPatientpsych = (sd_Group_muPatientpsych[1] * (z_Group_muPatientpsych[1]));
  r_Condition_muSDOH = (sd_Condition_muSDOH[1] * (z_Condition_muSDOH[1]));
  r_Group_muSDOH = (sd_Group_muSDOH[1] * (z_Group_muSDOH[1]));
  r_Condition_muCanmeds = (sd_Condition_muCanmeds[1] * (z_Condition_muCanmeds[1]));
  r_Group_muCanmeds = (sd_Group_muCanmeds[1] * (z_Group_muCanmeds[1]));
  r_Condition_muExpertise = (sd_Condition_muExpertise[1] * (z_Condition_muExpertise[1]));
  r_Group_muExpertise = (sd_Group_muExpertise[1] * (z_Group_muExpertise[1]));
  r_Condition_muRecommendations = (sd_Condition_muRecommendations[1] * (z_Condition_muRecommendations[1]));
  r_Group_muRecommendations = (sd_Group_muRecommendations[1] * (z_Group_muRecommendations[1]));
}
model {
  // initialize linear predictor term
  vector[N] muBuilding = Intercept_muBuilding + Xc * b_muBuilding;
  // initialize linear predictor term
  vector[N] muCaregiver = Intercept_muCaregiver + Xc * b_muCaregiver;
  // initialize linear predictor term
  vector[N] muPatient = Intercept_muPatient + Xc * b_muPatient;
  // initialize linear predictor term
  vector[N] muPatientpsych = Intercept_muPatientpsych + Xc * b_muPatientpsych;
  // initialize linear predictor term
  vector[N] muSDOH = Intercept_muSDOH + Xc * b_muSDOH;
  // initialize linear predictor term
  vector[N] muCanmeds = Intercept_muCanmeds + Xc * b_muCanmeds;
  // initialize linear predictor term
  vector[N] muExpertise = Intercept_muExpertise + Xc * b_muExpertise;
  // initialize linear predictor term
  vector[N] muRecommendations = Intercept_muRecommendations + Xc * b_muRecommendations;
  // linear predictor matrix
  vector[ncat] mu[N];
  for (n in 1:N) {
    // add more terms to the linear predictor
    muBuilding[n] += r_Condition_muBuilding[J_1[n]] * Z_Condition[n] + r_Group_muBuilding[J_2[n]] * Z_Group[n];
  }
  for (n in 1:N) {
    // add more terms to the linear predictor
    muCaregiver[n] += r_Condition_muCaregiver[J_1[n]] * Z_Condition[n] + r_Group_muCaregiver[J_2[n]] * Z_Group[n];
  }
  for (n in 1:N) {
    // add more terms to the linear predictor
    muPatient[n] += r_Condition_muPatient[J_1[n]] * Z_Condition[n] + r_Group_muPatient[J_2[n]] * Z_Group[n];
  }
  for (n in 1:N) {
    // add more terms to the linear predictor
    muPatientpsych[n] += r_Condition_muPatientpsych[J_1[n]] * Z_Condition[n] + r_Group_muPatientpsych[J_2[n]] * Z_Group[n];
  }
  for (n in 1:N) {
    // add more terms to the linear predictor
    muSDOH[n] += r_Condition_muSDOH[J_1[n]] * Z_Condition[n] + r_Group_muSDOH[J_2[n]] * Z_Group[n];
  }
  for (n in 1:N) {
    // add more terms to the linear predictor
    muCanmeds[n] += r_Condition_muCanmeds[J_1[n]] * Z_Condition[n] + r_Group_muCanmeds[J_2[n]] * Z_Group[n];
  }
  for (n in 1:N) {
    // add more terms to the linear predictor
    muExpertise[n] += r_Condition_muExpertise[J_1[n]] * Z_Condition[n] + r_Group_muExpertise[J_2[n]] * Z_Group[n];
  }
  for (n in 1:N) {
    // add more terms to the linear predictor
    muRecommendations[n] += r_Condition_muRecommendations[J_1[n]] * Z_Condition[n] + r_Group_muRecommendations[J_2[n]] * Z_Group[n];
  }
  for (n in 1:N) {
    mu[n] = [0, muBuilding[n], muCaregiver[n], muPatient[n], muPatientpsych[n], muSDOH[n], muCanmeds[n], muExpertise[n], muRecommendations[n]]';
  }
// priors including all constants
  b_muBuilding ~ normal(0, 1);
  Intercept_muBuilding ~ normal(0, 1);
  b_muCaregiver ~ normal(0, 1);
  Intercept_muCaregiver ~ normal(0, 1);
  b_muPatient ~ normal(0, 1);
  Intercept_muPatient ~ normal(0, 1);
  b_muPatientpsych ~ normal(0, 1);
  Intercept_muPatientpsych ~ normal(0, 1);
  b_muSDOH ~ normal(0, 1);
  Intercept_muSDOH ~ normal(0, 1);
  b_muCanmeds ~ normal(0, 1);
  Intercept_muCanmeds ~ normal(0, 1);
  b_muExpertise ~ normal(0, 1);
  Intercept_muExpertise ~ normal(0, 1);
  b_muRecommendations ~ normal(0, 1);
  Intercept_muRecommendations ~ normal(0, 1);
  target += cauchy_lpdf(sd_Condition_muBuilding[1] | 0,2)
    - 1 * cauchy_lccdf(0 | 0,2);
  target += normal_lpdf(z_Condition_muBuilding[1] | 0, 1);
  target += cauchy_lpdf(sd_Group_muBuilding[1] | 0,exp(1))
    - 1 * cauchy_lccdf(0 | 0,exp(1));
  target += normal_lpdf(z_Group_muBuilding[1] | 0, 1);
  target += cauchy_lpdf(sd_Condition_muCaregiver[1] | 0,2)
    - 1 * cauchy_lccdf(0 | 0,2);
  target += normal_lpdf(z_Condition_muCaregiver[1] | 0, 1);
  target += cauchy_lpdf(sd_Group_muCaregiver[1] | 0,exp(1))
    - 1 * cauchy_lccdf(0 | 0,exp(1));
  target += normal_lpdf(z_Group_muCaregiver[1] | 0, 1);
  target += cauchy_lpdf(sd_Group_muPatient[1] | 0,2)
    - 1 * cauchy_lccdf(0 | 0,2);
  target += normal_lpdf(z_Group_muPatient[1] | 0, 1);
  target += cauchy_lpdf(sd_Condition_muPatient[1] | 0,exp(1))
    - 1 * cauchy_lccdf(0 | 0,exp(1));
  target += normal_lpdf(z_Condition_muPatient[1] | 0, 1);
  target += cauchy_lpdf(sd_Condition_muPatientpsych[1] | 0,2)
    - 1 * cauchy_lccdf(0 | 0,2);
  target += normal_lpdf(z_Condition_muPatientpsych[1] | 0, 1);
  target += cauchy_lpdf(sd_Group_muPatientpsych[1] | 0,exp(1))
    - 1 * cauchy_lccdf(0 | 0,exp(1));
  target += normal_lpdf(z_Group_muPatientpsych[1] | 0, 1);
  target += cauchy_lpdf(sd_Condition_muSDOH[1] | 0,2)
    - 1 * cauchy_lccdf(0 | 0,2);
  target += normal_lpdf(z_Condition_muSDOH[1] | 0, 1);
  target += cauchy_lpdf(sd_Group_muSDOH[1] | 0,exp(1))
    - 1 * cauchy_lccdf(0 | 0,exp(1));
  target += normal_lpdf(z_Group_muSDOH[1] | 0, 1);
  target += cauchy_lpdf(sd_Condition_muCanmeds[1] | 0,2)
    - 1 * cauchy_lccdf(0 | 0,2);
  target += normal_lpdf(z_Condition_muCanmeds[1] | 0, 1);
  target += cauchy_lpdf(sd_Group_muCanmeds[1] | 0,exp(1))
    - 1 * cauchy_lccdf(0 | 0,exp(1));
  target += normal_lpdf(z_Group_muCanmeds[1] | 0, 1);
  target += cauchy_lpdf(sd_Condition_muExpertise[1] | 0,2)
    - 1 * cauchy_lccdf(0 | 0,2);
  target += normal_lpdf(z_Condition_muExpertise[1] | 0, 1);
  target += cauchy_lpdf(sd_Group_muExpertise[1] | 0,exp(1))
    - 1 * cauchy_lccdf(0 | 0,exp(1));
  target += normal_lpdf(z_Group_muExpertise[1] | 0, 1);
  target += cauchy_lpdf(sd_Condition_muRecommendations[1] | 0,2)
    - 1 * cauchy_lccdf(0 | 0,2);
  target += normal_lpdf(z_Condition_muRecommendations[1] | 0, 1);
  target += cauchy_lpdf(sd_Group_muRecommendations[1] | 0,exp(1))
    - 1 * cauchy_lccdf(0 | 0,exp(1));
  target += normal_lpdf(z_Group_muRecommendations[1] | 0, 1);
  // likelihood including all constants
  if (!prior_only) {
    for (n in 1:N) {
      target += multinomial_logit_lpmf(Coded[n] | mu[n]);
    }
  }
}
generated quantities {
  // actual population-level intercept
  real b_muBuilding_Intercept = Intercept_muBuilding - dot_product(means_X, b_muBuilding);
  // actual population-level intercept
  real b_muCaregiver_Intercept = Intercept_muCaregiver - dot_product(means_X, b_muCaregiver);
  // actual population-level intercept
  real b_muPatient_Intercept = Intercept_muPatient - dot_product(means_X, b_muPatient);
  // actual population-level intercept
  real b_muPatientpsych_Intercept = Intercept_muPatientpsych - dot_product(means_X, b_muPatientpsych);
  // actual population-level intercept
  real b_muSDOH_Intercept = Intercept_muSDOH - dot_product(means_X, b_muSDOH);
  // actual population-level intercept
  real b_muCanmeds_Intercept = Intercept_muCanmeds - dot_product(means_X, b_muCanmeds);
  // actual population-level intercept
  real b_muExpertise_Intercept = Intercept_muExpertise - dot_product(means_X, b_muExpertise);
  // actual population-level intercept
  real b_muRecommendations_Intercept = Intercept_muRecommendations - dot_product(means_X, b_muRecommendations);
}

Would there a way to vectorize the for loop for mu’s?

  for (n in 1:N) {
    mu[n] = [0, muBuilding[n], muCaregiver[n], muPatient[n], muPatientpsych[n], muSDOH[n], muCanmeds[n], muExpertise[n], muRecommendations[n]]';
  }

andrjohns · May 1, 2020, 12:25am

Great to hear the runtime is coming down!

You can still use the vectorised calculations to avoid some unnecessary looping. There would also be a bit of a speed-up by combining the declaration and accumulation of the linear predictors:

  vector[N] muBuilding;
  ...
  muBuilding = Intercept_muBuilding + Xc * b_muBuilding
                + r_Condition_muBuilding[J_1] .* Z_Condition
                + r_Group_muBuilding[J_2] .* Z_Group;

Would there a way to vectorize the for loop for mu’s?

There’s no way to vectorise the mu, but since you’re looping over N for the multinomial_logit anyway you can just create the vector on-the-fly when you call the distribution:

      target += multinomial_logit_lpmf(Coded[n] | [0, muBuilding[n], muCaregiver[n], muPatient[n],
                                                   muPatientpsych[n], muSDOH[n], muCanmeds[n],
                                                   muExpertise[n], muRecommendations[n]]');

Stan also has a built-in standard-normal distribution which is more computationally efficient than normal(0, 1), so you could try that as well:

b_muBuilding ~ std_normal();

Don’t forget to change the normal_lpdf calls for the z parameters as well.

For the cauchys you need to keep the cauchy_lpdf calls, but you can optimise with the cauchy_lccdf calls. Since there are multiple calls with the same parameters, you can just subtract the appropriate multiple of the density rather than computing the density multiple times. So the priors for the cauchys becomes:

  target += cauchy_lpdf(sd_Condition_muBuilding[1] | 0,2);
  target += cauchy_lpdf(sd_Group_muBuilding[1] | 0,exp(1));
  target += cauchy_lpdf(sd_Condition_muCaregiver[1] | 0,2);
  target += cauchy_lpdf(sd_Group_muCaregiver[1] | 0,exp(1));
  target += cauchy_lpdf(sd_Group_muPatient[1] | 0,2);
  target += cauchy_lpdf(sd_Condition_muPatient[1] | 0,exp(1));
  target += cauchy_lpdf(sd_Condition_muPatientpsych[1] | 0,2);
  target += cauchy_lpdf(sd_Group_muPatientpsych[1] | 0,exp(1));
  target += cauchy_lpdf(sd_Condition_muSDOH[1] | 0,2);
  target += cauchy_lpdf(sd_Group_muSDOH[1] | 0,exp(1));
  target += cauchy_lpdf(sd_Condition_muCanmeds[1] | 0,2);
  target += cauchy_lpdf(sd_Group_muCanmeds[1] | 0,exp(1));
  target += cauchy_lpdf(sd_Condition_muExpertise[1] | 0,2);
  target += cauchy_lpdf(sd_Group_muExpertise[1] | 0,exp(1));
  target += cauchy_lpdf(sd_Condition_muRecommendations[1] | 0,2);
  target += cauchy_lpdf(sd_Group_muRecommendations[1] | 0,exp(1));

  target -= 8 * cauchy_lccdf(0 | 0,2);
  target -= 8 * cauchy_lccdf(0 | 0,exp(1));

Additionally, there are many repeated calls to exp(1), it would be more efficient to calculate it once in the transformed data block, and then re-use the calculated value throughout the model:

transformed data{
  real exp1 = exp(1);
}

andrjohns · May 1, 2020, 12:36am

Also, can you double-check that M_Condition is 1? As you have several parameters that are length M_Condition, but only use or put priors on the first element. If M_Condition is greater than 1, then the other elements will be implicitly receiving a uniform prior, which could trip up the sampling

JLC · May 1, 2020, 1:37am

Thank you so much!

Yes, M_Condition and M_Group are always one. It looks like they specify the number of estimated parameters, which is just an intercept for both.

andrjohns · May 1, 2020, 1:52am

Yes, M_Condition and M_Group are always one. It looks like they specify the number of estimated parameters, which is just an intercept for both.

Great!

Something I also realised, you can move the cauchy_lccdf calls to the transformed_data block to cut down on more computations as well:

transformed data {
  real 8_cauchylccdf_0_2 = 8 * cauchy_lccdf(0 | 0,2);
  real 8_cauchylccdf_0_exp1 = 8 * cauchy_lccdf(0 | 0,exp1);
}
...
model {
  ...
  target -= 8_cauchylccdf_0_2;
  target -= 8_cauchylccdf_0_exp1;
}

JLC · May 1, 2020, 2:47am

Does

  target -= 8_cauchylccdf_0_2;
  target -= 8_cauchylccdf_0_exp1;

Go into the Model block, or into the priors block?

I’ve tried it both an get an error when compiling the model:
Variable "target" does not exist.

andrjohns · May 1, 2020, 2:51am

In the model block

Try instead:

  target += -8_cauchylccdf_0_2;
  target += -8_cauchylccdf_0_exp1;

JLC · May 1, 2020, 3:00am

That did it!

This is so much faster! It’s gone from ~300 seconds for 1000 transitions with 10 leapfrog steps down to ~70!

andrjohns · May 1, 2020, 3:13am

Great to hear!

If the model runs without divergences and the results are consistent, can you post up your final model code? It can be helpful for people that have similar issues in the future.

JLC · May 4, 2020, 2:57pm

Not quite there in terms of running without divergences, but I think I’m getting close!

Here’s is the final model code:

functions {

  /* multinomial-logit log-PMF
   * Args: 
   *   y: array of integer response values
   *   mu: vector of category logit probabilities
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
   real multinomial_logit_lpmf(int[] y, vector mu) {
     return multinomial_lpmf(y | softmax(mu));
   }
}
data {
  int<lower=1> N;  // number of observations
  int<lower=2> ncat;  // number of categories
  int Coded[N, ncat];  // response array
  int trials[N];  // number of trials
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1 (Condition)
  int<lower=1> n_Condition;  // number of grouping levels for Condition
  int<lower=1> M_Condition;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values for Condition
  vector[N] Z_Condition;
  // data for group-level effects of ID 2 (Group)
  int<lower=1> n_Group;  // number of grouping levels for Group
  int<lower=1> M_Group;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_Group;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  real exp1 = exp(1);
  real Condition_cauchy_lccdf_exp1_8 = 8 * cauchy_lccdf(0 | 0,exp1);
  real Group_cauchy_lccdf_2_8 = 8 * cauchy_lccdf(0 | 0,2);
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X_muBuilding without an intercept
  vector[Kc] means_X;  // column means of X_muBuilding before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b_muBuilding;  // population-level effects
  real Intercept_muBuilding;  // temporary intercept for centered predictors
  vector[Kc] b_muCaregiver;  // population-level effects
  real Intercept_muCaregiver;  // temporary intercept for centered predictors
  vector[Kc] b_muPatient;  // population-level effects
  real Intercept_muPatient;  // temporary intercept for centered predictors
  vector[Kc] b_muPatientpsych;  // population-level effects
  real Intercept_muPatientpsych;  // temporary intercept for centered predictors
  vector[Kc] b_muSDOH;  // population-level effects
  real Intercept_muSDOH;  // temporary intercept for centered predictors
  vector[Kc] b_muCanmeds;  // population-level effects
  real Intercept_muCanmeds;  // temporary intercept for centered predictors
  vector[Kc] b_muExpertise;  // population-level effects
  real Intercept_muExpertise;  // temporary intercept for centered predictors
  vector[Kc] b_muRecommendations;  // population-level effects
  real Intercept_muRecommendations;  // temporary intercept for centered predictors
  vector<lower=0>[M_Condition] sd_Condition_muBuilding;  // group-level standard deviations
  vector[n_Condition] z_Condition_muBuilding[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muBuilding;  // group-level standard deviations
  vector[n_Group] z_Group_muBuilding[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muCaregiver;  // group-level standard deviations
  vector[n_Condition] z_Condition_muCaregiver[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muCaregiver;  // group-level standard deviations
  vector[n_Group] z_Group_muCaregiver[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muPatient;  // group-level standard deviations
  vector[n_Condition] z_Condition_muPatient[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muPatient;  // group-level standard deviations
  vector[n_Group] z_Group_muPatient[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muPatientpsych;  // group-level standard deviations
  vector[n_Condition] z_Condition_muPatientpsych[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muPatientpsych;  // group-level standard deviations
  vector[n_Group] z_Group_muPatientpsych[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muSDOH;  // group-level standard deviations
  vector[n_Condition] z_Condition_muSDOH[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muSDOH;  // group-level standard deviations
  vector[n_Group] z_Group_muSDOH[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muCanmeds;  // group-level standard deviations
  vector[n_Condition] z_Condition_muCanmeds[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muCanmeds;  // group-level standard deviations
  vector[n_Group] z_Group_muCanmeds[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muExpertise;  // group-level standard deviations
  vector[n_Condition] z_Condition_muExpertise[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muExpertise;  // group-level standard deviations
  vector[n_Group] z_Group_muExpertise[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muRecommendations;  // group-level standard deviations
  vector[n_Condition] z_Condition_muRecommendations[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muRecommendations;  // group-level standard deviations
  vector[n_Group] z_Group_muRecommendations[M_Group];  // standardized group-level effects
}
transformed parameters {
  vector[n_Condition] r_Condition_muBuilding;  // actual group-level effects
  vector[n_Group] r_Group_muBuilding;  // actual group-level effects
  vector[n_Condition] r_Condition_muCaregiver;  // actual group-level effects
  vector[n_Group] r_Group_muCaregiver;  // actual group-level effects
  vector[n_Condition] r_Condition_muPatient;  // actual group-level effects
  vector[n_Group] r_Group_muPatient;  // actual group-level effects
  vector[n_Condition] r_Condition_muPatientpsych;  // actual group-level effects
  vector[n_Group] r_Group_muPatientpsych;  // actual group-level effects
  vector[n_Condition] r_Condition_muSDOH;  // actual group-level effects
  vector[n_Group] r_Group_muSDOH;  // actual group-level effects
  vector[n_Condition] r_Condition_muCanmeds;  // actual group-level effects
  vector[n_Group] r_Group_muCanmeds;  // actual group-level effects
  vector[n_Condition] r_Condition_muExpertise;  // actual group-level effects
  vector[n_Group] r_Group_muExpertise;  // actual group-level effects
  vector[n_Condition] r_Condition_muRecommendations;  // actual group-level effects
  vector[n_Group] r_Group_muRecommendations;  // actual group-level effects
  r_Condition_muBuilding = (sd_Condition_muBuilding[1] * (z_Condition_muBuilding[1]));
  r_Group_muBuilding = (sd_Group_muBuilding[1] * (z_Group_muBuilding[1]));
  r_Condition_muCaregiver = (sd_Condition_muCaregiver[1] * (z_Condition_muCaregiver[1]));
  r_Group_muCaregiver = (sd_Group_muCaregiver[1] * (z_Group_muCaregiver[1]));
  r_Condition_muPatient = (sd_Condition_muPatient[1] * (z_Condition_muPatient[1]));
  r_Group_muPatient = (sd_Group_muPatient[1] * (z_Group_muPatient[1]));
  r_Condition_muPatientpsych = (sd_Condition_muPatientpsych[1] * (z_Condition_muPatientpsych[1]));
  r_Group_muPatientpsych = (sd_Group_muPatientpsych[1] * (z_Group_muPatientpsych[1]));
  r_Condition_muSDOH = (sd_Condition_muSDOH[1] * (z_Condition_muSDOH[1]));
  r_Group_muSDOH = (sd_Group_muSDOH[1] * (z_Group_muSDOH[1]));
  r_Condition_muCanmeds = (sd_Condition_muCanmeds[1] * (z_Condition_muCanmeds[1]));
  r_Group_muCanmeds = (sd_Group_muCanmeds[1] * (z_Group_muCanmeds[1]));
  r_Condition_muExpertise = (sd_Condition_muExpertise[1] * (z_Condition_muExpertise[1]));
  r_Group_muExpertise = (sd_Group_muExpertise[1] * (z_Group_muExpertise[1]));
  r_Condition_muRecommendations = (sd_Condition_muRecommendations[1] * (z_Condition_muRecommendations[1]));
  r_Group_muRecommendations = (sd_Group_muRecommendations[1] * (z_Group_muRecommendations[1]));
}
model {
// Declare linear predictor terms
  vector[N] muBuilding;
  vector[N] muCaregiver;
  vector[N] muPatient;
  vector[N] muPatientpsych;
  vector[N] muSDOH;
  vector[N] muCanmeds;
  vector[N] muExpertise;
  vector[N] muRecommendations;
// Accumulate linear predictor terms  
  muBuilding = Intercept_muBuilding + Xc * b_muBuilding
                + r_Condition_muBuilding[J_1] .* Z_Condition
                + r_Group_muBuilding[J_2] .* Z_Group;
  muCaregiver = Intercept_muCaregiver + Xc * b_muCaregiver
                + r_Condition_muCaregiver[J_1] .* Z_Condition
                + r_Group_muCaregiver[J_2] .* Z_Group;
  muPatient = Intercept_muPatient + Xc * b_muPatient
                + r_Condition_muPatient[J_1] .* Z_Condition
                + r_Group_muPatient[J_2] .* Z_Group;
  muPatientpsych = Intercept_muPatientpsych + Xc * b_muPatientpsych
                + r_Condition_muPatientpsych[J_1] .* Z_Condition
                + r_Group_muPatientpsych[J_2] .* Z_Group;
  muSDOH = Intercept_muSDOH + Xc * b_muSDOH
                + r_Condition_muBuilding[J_1] .* Z_Condition
                + r_Group_muSDOH[J_2] .* Z_Group;
  muCanmeds = Intercept_muCanmeds + Xc * b_muCanmeds
                + r_Condition_muCanmeds[J_1] .* Z_Condition
                + r_Group_muCanmeds[J_2] .* Z_Group;
  muExpertise = Intercept_muExpertise + Xc * b_muExpertise
                + r_Condition_muExpertise[J_1] .* Z_Condition
                + r_Group_muExpertise[J_2] .* Z_Group;
  muRecommendations = Intercept_muRecommendations + Xc * b_muRecommendations
                + r_Condition_muRecommendations[J_1] .* Z_Condition
                + r_Group_muRecommendations[J_2] .* Z_Group;
  target += -Condition_cauchy_lccdf_exp1_8;
  target += -Group_cauchy_lccdf_2_8;
// priors including all constants
  b_muBuilding ~ std_normal();
  Intercept_muBuilding ~ std_normal();
  b_muCaregiver ~ std_normal();
  Intercept_muCaregiver ~ std_normal();
  b_muPatient ~ std_normal();
  Intercept_muPatient ~ std_normal();
  b_muPatientpsych ~ std_normal();
  Intercept_muPatientpsych ~ std_normal();
  b_muSDOH ~ std_normal();
  Intercept_muSDOH ~ std_normal();
  b_muCanmeds ~ std_normal();
  Intercept_muCanmeds ~ std_normal();
  b_muExpertise ~ std_normal();
  Intercept_muExpertise ~ std_normal(); 
  b_muRecommendations ~ std_normal();
  Intercept_muRecommendations ~ std_normal();
  target += cauchy_lpdf(sd_Condition_muBuilding[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muBuilding[1] | 0,2);
  target += cauchy_lpdf(sd_Condition_muCaregiver[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muCaregiver[1] | 0,2);  
  target += cauchy_lpdf(sd_Condition_muPatient[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muPatient[1] | 0,2);
  target += cauchy_lpdf(sd_Condition_muPatientpsych[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muPatientpsych[1] | 0,2);
  target += cauchy_lpdf(sd_Condition_muSDOH[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muSDOH[1] | 0,2);
  target += cauchy_lpdf(sd_Condition_muCanmeds[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muCanmeds[1] | 0,2);
  target += cauchy_lpdf(sd_Condition_muExpertise[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muExpertise[1] | 0,2);
  target += cauchy_lpdf(sd_Condition_muRecommendations[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muRecommendations[1] | 0,2);  
  target += normal_lpdf(z_Condition_muBuilding[1] | 0, 1);
  target += normal_lpdf(z_Group_muBuilding[1] | 0, 1);
  target += normal_lpdf(z_Condition_muCaregiver[1] | 0, 1);
  target += normal_lpdf(z_Group_muCaregiver[1] | 0, 1);
  target += normal_lpdf(z_Group_muPatient[1] | 0, 1);
  target += normal_lpdf(z_Condition_muPatient[1] | 0, 1);
  target += normal_lpdf(z_Condition_muPatientpsych[1] | 0, 1);
  target += normal_lpdf(z_Group_muPatientpsych[1] | 0, 1);
  target += normal_lpdf(z_Condition_muSDOH[1] | 0, 1);
  target += normal_lpdf(z_Group_muSDOH[1] | 0, 1);
  target += normal_lpdf(z_Condition_muCanmeds[1] | 0, 1);
  target += normal_lpdf(z_Group_muCanmeds[1] | 0, 1);
  target += normal_lpdf(z_Condition_muExpertise[1] | 0, 1);
  target += normal_lpdf(z_Group_muExpertise[1] | 0, 1);
  target += normal_lpdf(z_Condition_muRecommendations[1] | 0, 1);
  target += normal_lpdf(z_Group_muRecommendations[1] | 0, 1);
  // likelihood including all constants
  target += multinomial_logit_lpmf(Coded[N] | [0, muBuilding[N], muCaregiver[N], muPatient[N],
                                                   muPatientpsych[N], muSDOH[N], muCanmeds[N],
                                                   muExpertise[N], muRecommendations[N]]');
    }
generated quantities {
  // actual population-level intercept
  real b_muBuilding_Intercept = Intercept_muBuilding - dot_product(means_X, b_muBuilding);
  // actual population-level intercept
  real b_muCaregiver_Intercept = Intercept_muCaregiver - dot_product(means_X, b_muCaregiver);
  // actual population-level intercept
  real b_muPatient_Intercept = Intercept_muPatient - dot_product(means_X, b_muPatient);
  // actual population-level intercept
  real b_muPatientpsych_Intercept = Intercept_muPatientpsych - dot_product(means_X, b_muPatientpsych);
  // actual population-level intercept
  real b_muSDOH_Intercept = Intercept_muSDOH - dot_product(means_X, b_muSDOH);
  // actual population-level intercept
  real b_muCanmeds_Intercept = Intercept_muCanmeds - dot_product(means_X, b_muCanmeds);
  // actual population-level intercept
  real b_muExpertise_Intercept = Intercept_muExpertise - dot_product(means_X, b_muExpertise);
  // actual population-level intercept
  real b_muRecommendations_Intercept = Intercept_muRecommendations - dot_product(means_X, b_muRecommendations);
}

Here is my script for running the model on the attached data:
Data.csv (319.4 KB)

# List of needed packages
Pkgs <- c("tidyverse", "parallel", "rstan")

# Load packages
lapply(Pkgs, require, c = T)

## Set computing options
ncores = detectCores()
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

# rstan options
rstan_options("auto_write" = TRUE) 

## Load data
Data <- read.csv("Data.csv") 

## Set up data for Stan
Stan_Data <- list(
  N = nrow(Data),
  Coded = with(
    Data,
    cbind(
      NoCode,
      Building,
      Caregiver,
      Patient,
      Patientpsych,
      SDOH,
      Canmeds,
      Expertise,
      Recommendations
    )
  ),
  trials = Data$Count,
  X = data.frame(
    Intercept = rep(1, nrow(Data)),
    ConditionIntervention = ifelse(Data$Condition == "Intervention", 1, 0),
    Session_TypeInitial_Homecare = ifelse(Data$Session_Type == "Initial_Homecare", 1, 0),
    Session_TypeInitial_Debrief = ifelse(Data$Session_Type == "Initial_Debrief", 1, 0),
    Session_TypeFollowUp_Homecare = ifelse(Data$Session_Type == "Follow-up_Homecare", 1, 0),
    Session_TypeFollowUp_Debrief = ifelse(Data$Session_Type == "Follow-up_Debrief", 1, 0),
    FacilitatorFacilitator2 = ifelse(Data$Facilitator == "Facilitator 2", 1, 0),
    `ConditionIntervention:Session_TypeInitial_Homecare` = ifelse(
      Data$Condition == "Intervention" &
        Data$Session_Type == "Initial_Homecare",
      1,
      0
    ),
    `ConditionIntervention:Session_TypeInitial_Debrief` = ifelse(
      Data$Condition == "Intervention" &
        Data$Session_Type == "Initial_Debrief",
      1,
      0
    ),
    `ConditionIntervention:Session_TypeFollowUp_Homecare` = ifelse(
      Data$Condition == "Intervention" &
        Data$Session_Type == "Follow-up_Homecare",
      1,
      0
    ),
    `ConditionIntervention:Session_TypeFollowUp_Debrief` = ifelse(
      Data$Condition == "Intervention" &
        Data$Session_Type == "Follow-up_Debrief",
      1,
      0
    ),
    `Session_TypeInitial_Homecare:FacilitatorFacilitator2` = ifelse(
      Data$Session_Type == "Initial_Homecare" &
        Data$Facilitator == "Facilitator2",
      1,
      0
    ),
    `Session_TypeInitial_Debrief:FacilitatorFacilitator2` = ifelse(
      Data$Session_Type == "Initial_Debrief" &
        Data$Facilitator == "Facilitator2",
      1,
      0
    ),
    `Session_TypeInitial_Homecare:FacilitatorFacilitator2` = ifelse(
      Data$Session_Type == "Follow-up_Homecare" &
        Data$Facilitator == "Facilitator2",
      1,
      0
    ),
    `Session_TypeInitial_Debrief:FacilitatorFacilitator2` = ifelse(
      Data$Session_Type == "Follow-up_Debrief" &
        Data$Facilitator == "Facilitator2",
      1,
      0
    )
  ),
  n_Condition = length(unique(Data$Condition)),
  n_Group = length(unique(Data$Group)),
  M_Condition = 1,
  M_Group = 1,
  J_1 = as.integer(Data$Condition), # Condition in each observation
  J_2 = as.integer(Data$Group), # Group in each observation
  prior_only = 0
) %>%
  prepend(list(
    ncat = ncol(.$Coded),
    K = ncol(.$X),
    Z_Condition = rep(1, nrow(.$X)),
    Z_Group = rep(1, nrow(.$X))
  ))

# Compile Stan How model
Stan_Model <- stan_model(file = "Stan_Final_Model.stan", # Name of the existing Stan model specification
                             model_name = "Final Model", # Name of the resulting compiled model
                             auto_write = rstan_options("auto_write"))

# Sample from model
Stan_Fit <- sampling(Stan_Model,         # Model to sample from
                         data = Stan_Data,   # named list of data
                         iter = 5000,            # total number of iterations per chain
                         warmup = 3500,          # number of warmup iterations per chain
                         init = 0,               # set intial values to 0
                         chains = 2,             # number of Markov chains
                         cores = ncores,         # number of cores to use (1 per chain)
                         control=list(max_treedepth = 12, 
                                      adapt_delta = 0.99999999)
)

andrjohns · May 4, 2020, 3:44pm

The main source of your divergences is probably this part:

  target += multinomial_logit_lpmf(Coded[N] | [0, muBuilding[N], muCaregiver[N], muPatient[N],
                                                   muPatientpsych[N], muSDOH[N], muCanmeds[N],
                                                   muExpertise[N], muRecommendations[N]]');
    }

Because N is just the total number of individuals, you’re only specifying the multinomial_logit_lpmf prior for the very last individual. You’ll need to have this as a loop (which is what I think it used to be?) and that should be a bit nicer with the divergences

JLC · May 7, 2020, 12:14pm

Here’s the final, final code with the For loops for mu back in:

																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																functions {

  /* multinomial-logit log-PMF
   * Args: 
   *   y: array of integer response values
   *   mu: vector of category logit probabilities
   * Returns:  
   *   a scalar to be added to the log posterior 
   */ 
   real multinomial_logit_lpmf(int[] y, vector mu) {
     return multinomial_lpmf(y | softmax(mu));
   }
}
data {
  int<lower=1> N;  // number of observations
  int<lower=2> ncat;  // number of categories
  int Coded[N, ncat];  // response array
  int trials[N];  // number of trials
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1 (Condition)
  int<lower=1> n_Condition;  // number of grouping levels for Condition
  int<lower=1> M_Condition;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values for Condition
  vector[N] Z_Condition;
  // data for group-level effects of ID 2 (Group)
  int<lower=1> n_Group;  // number of grouping levels for Group
  int<lower=1> M_Group;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_Group;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  real exp1 = exp(1);
  real Condition_cauchy_lccdf_exp1_8 = 8 * cauchy_lccdf(0 | 0,exp1);
  real Group_cauchy_lccdf_2_8 = 8 * cauchy_lccdf(0 | 0,2);
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X_muBuilding without an intercept
  vector[Kc] means_X;  // column means of X_muBuilding before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}
parameters {
  vector[Kc] b_muBuilding;  // population-level effects
  real Intercept_muBuilding;  // temporary intercept for centered predictors
  vector[Kc] b_muCaregiver;  // population-level effects
  real Intercept_muCaregiver;  // temporary intercept for centered predictors
  vector[Kc] b_muPatient;  // population-level effects
  real Intercept_muPatient;  // temporary intercept for centered predictors
  vector[Kc] b_muPatientpsych;  // population-level effects
  real Intercept_muPatientpsych;  // temporary intercept for centered predictors
  vector[Kc] b_muSDOH;  // population-level effects
  real Intercept_muSDOH;  // temporary intercept for centered predictors
  vector[Kc] b_muCanmeds;  // population-level effects
  real Intercept_muCanmeds;  // temporary intercept for centered predictors
  vector[Kc] b_muExpertise;  // population-level effects
  real Intercept_muExpertise;  // temporary intercept for centered predictors
  vector[Kc] b_muRecommendations;  // population-level effects
  real Intercept_muRecommendations;  // temporary intercept for centered predictors
  vector<lower=0>[M_Condition] sd_Condition_muBuilding;  // group-level standard deviations
  vector[n_Condition] z_Condition_muBuilding[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muBuilding;  // group-level standard deviations
  vector[n_Group] z_Group_muBuilding[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muCaregiver;  // group-level standard deviations
  vector[n_Condition] z_Condition_muCaregiver[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muCaregiver;  // group-level standard deviations
  vector[n_Group] z_Group_muCaregiver[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muPatient;  // group-level standard deviations
  vector[n_Condition] z_Condition_muPatient[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muPatient;  // group-level standard deviations
  vector[n_Group] z_Group_muPatient[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muPatientpsych;  // group-level standard deviations
  vector[n_Condition] z_Condition_muPatientpsych[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muPatientpsych;  // group-level standard deviations
  vector[n_Group] z_Group_muPatientpsych[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muSDOH;  // group-level standard deviations
  vector[n_Condition] z_Condition_muSDOH[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muSDOH;  // group-level standard deviations
  vector[n_Group] z_Group_muSDOH[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muCanmeds;  // group-level standard deviations
  vector[n_Condition] z_Condition_muCanmeds[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muCanmeds;  // group-level standard deviations
  vector[n_Group] z_Group_muCanmeds[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muExpertise;  // group-level standard deviations
  vector[n_Condition] z_Condition_muExpertise[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muExpertise;  // group-level standard deviations
  vector[n_Group] z_Group_muExpertise[M_Group];  // standardized group-level effects
  vector<lower=0>[M_Condition] sd_Condition_muRecommendations;  // group-level standard deviations
  vector[n_Condition] z_Condition_muRecommendations[M_Condition];  // standardized group-level effects
  vector<lower=0>[M_Group] sd_Group_muRecommendations;  // group-level standard deviations
  vector[n_Group] z_Group_muRecommendations[M_Group];  // standardized group-level effects
}
transformed parameters {
  vector[n_Condition] r_Condition_muBuilding;  // actual group-level effects
  vector[n_Group] r_Group_muBuilding;  // actual group-level effects
  vector[n_Condition] r_Condition_muCaregiver;  // actual group-level effects
  vector[n_Group] r_Group_muCaregiver;  // actual group-level effects
  vector[n_Condition] r_Condition_muPatient;  // actual group-level effects
  vector[n_Group] r_Group_muPatient;  // actual group-level effects
  vector[n_Condition] r_Condition_muPatientpsych;  // actual group-level effects
  vector[n_Group] r_Group_muPatientpsych;  // actual group-level effects
  vector[n_Condition] r_Condition_muSDOH;  // actual group-level effects
  vector[n_Group] r_Group_muSDOH;  // actual group-level effects
  vector[n_Condition] r_Condition_muCanmeds;  // actual group-level effects
  vector[n_Group] r_Group_muCanmeds;  // actual group-level effects
  vector[n_Condition] r_Condition_muExpertise;  // actual group-level effects
  vector[n_Group] r_Group_muExpertise;  // actual group-level effects
  vector[n_Condition] r_Condition_muRecommendations;  // actual group-level effects
  vector[n_Group] r_Group_muRecommendations;  // actual group-level effects
  r_Condition_muBuilding = (sd_Condition_muBuilding[1] * (z_Condition_muBuilding[1]));
  r_Group_muBuilding = (sd_Group_muBuilding[1] * (z_Group_muBuilding[1]));
  r_Condition_muCaregiver = (sd_Condition_muCaregiver[1] * (z_Condition_muCaregiver[1]));
  r_Group_muCaregiver = (sd_Group_muCaregiver[1] * (z_Group_muCaregiver[1]));
  r_Condition_muPatient = (sd_Condition_muPatient[1] * (z_Condition_muPatient[1]));
  r_Group_muPatient = (sd_Group_muPatient[1] * (z_Group_muPatient[1]));
  r_Condition_muPatientpsych = (sd_Condition_muPatientpsych[1] * (z_Condition_muPatientpsych[1]));
  r_Group_muPatientpsych = (sd_Group_muPatientpsych[1] * (z_Group_muPatientpsych[1]));
  r_Condition_muSDOH = (sd_Condition_muSDOH[1] * (z_Condition_muSDOH[1]));
  r_Group_muSDOH = (sd_Group_muSDOH[1] * (z_Group_muSDOH[1]));
  r_Condition_muCanmeds = (sd_Condition_muCanmeds[1] * (z_Condition_muCanmeds[1]));
  r_Group_muCanmeds = (sd_Group_muCanmeds[1] * (z_Group_muCanmeds[1]));
  r_Condition_muExpertise = (sd_Condition_muExpertise[1] * (z_Condition_muExpertise[1]));
  r_Group_muExpertise = (sd_Group_muExpertise[1] * (z_Group_muExpertise[1]));
  r_Condition_muRecommendations = (sd_Condition_muRecommendations[1] * (z_Condition_muRecommendations[1]));
  r_Group_muRecommendations = (sd_Group_muRecommendations[1] * (z_Group_muRecommendations[1]));
}
model {
// Declare linear predictor terms
  vector[ncat] mu[N];
  vector[N] muBuilding;
  vector[N] muCaregiver;
  vector[N] muPatient;
  vector[N] muPatientpsych;
  vector[N] muSDOH;
  vector[N] muCanmeds;
  vector[N] muExpertise;
  vector[N] muRecommendations;
// Accumulate linear predictor terms  
  muBuilding = Intercept_muBuilding + Xc * b_muBuilding
                + r_Condition_muBuilding[J_1] .* Z_Condition
                + r_Group_muBuilding[J_2] .* Z_Group;
  muCaregiver = Intercept_muCaregiver + Xc * b_muCaregiver
                + r_Condition_muCaregiver[J_1] .* Z_Condition
                + r_Group_muCaregiver[J_2] .* Z_Group;
  muPatient = Intercept_muPatient + Xc * b_muPatient
                + r_Condition_muPatient[J_1] .* Z_Condition
                + r_Group_muPatient[J_2] .* Z_Group;
  muPatientpsych = Intercept_muPatientpsych + Xc * b_muPatientpsych
                + r_Condition_muPatientpsych[J_1] .* Z_Condition
                + r_Group_muPatientpsych[J_2] .* Z_Group;
  muSDOH = Intercept_muSDOH + Xc * b_muSDOH																																						
                + r_Condition_muBuilding[J_1] .* Z_Condition
                + r_Group_muSDOH[J_2] .* Z_Group;
  muCanmeds = Intercept_muCanmeds + Xc * b_muCanmeds
                + r_Condition_muCanmeds[J_1] .* Z_Condition
                + r_Group_muCanmeds[J_2] .* Z_Group;
  muExpertise = Intercept_muExpertise + Xc * b_muExpertise
                + r_Condition_muExpertise[J_1] .* Z_Condition
                + r_Group_muExpertise[J_2] .* Z_Group;
  muRecommendations = Intercept_muRecommendations + Xc * b_muRecommendations
                + r_Condition_muRecommendations[J_1] .* Z_Condition
                + r_Group_muRecommendations[J_2] .* Z_Group;
  target += -Condition_cauchy_lccdf_exp1_8;
  target += -Group_cauchy_lccdf_2_8;
  for (n in 1:N) {
      mu[n] = [0, muBuilding[n], muCaregiver[n], muPatient[n], muPatientpsych[n], muSDOH[n], muCanmeds[n], muExpertise[n], muRecommendations[n]]';
    }
// priors including all constants
  b_muBuilding ~ std_normal();
  Intercept_muBuilding ~ std_normal();
  b_muCaregiver ~ std_normal();
  Intercept_muCaregiver ~ std_normal();
  b_muPatient ~ std_normal();
  Intercept_muPatient ~ std_normal();
  b_muPatientpsych ~ std_normal();
  Intercept_muPatientpsych ~ std_normal();
  b_muSDOH ~ std_normal();
  Intercept_muSDOH ~ std_normal();
  b_muCanmeds ~ std_normal();
  Intercept_muCanmeds ~ std_normal();
  b_muExpertise ~ std_normal();
  Intercept_muExpertise ~ std_normal(); 
  b_muRecommendations ~ std_normal();
  Intercept_muRecommendations ~ std_normal();
  target += cauchy_lpdf(sd_Condition_muBuilding[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muBuilding[1] | 0,2);
  target += cauchy_lpdf(sd_Condition_muCaregiver[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muCaregiver[1] | 0,2);  
  target += cauchy_lpdf(sd_Condition_muPatient[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muPatient[1] | 0,2);
  target += cauchy_lpdf(sd_Condition_muPatientpsych[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muPatientpsych[1] | 0,2);
  target += cauchy_lpdf(sd_Condition_muSDOH[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muSDOH[1] | 0,2);
  target += cauchy_lpdf(sd_Condition_muCanmeds[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muCanmeds[1] | 0,2);
  target += cauchy_lpdf(sd_Condition_muExpertise[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muExpertise[1] | 0,2);
  target += cauchy_lpdf(sd_Condition_muRecommendations[1] | 0,exp1);
  target += cauchy_lpdf(sd_Group_muRecommendations[1] | 0,2);  
  target += normal_lpdf(z_Condition_muBuilding[1] | 0, 1);
  target += normal_lpdf(z_Group_muBuilding[1] | 0, 1);
  target += normal_lpdf(z_Condition_muCaregiver[1] | 0, 1);
  target += normal_lpdf(z_Group_muCaregiver[1] | 0, 1);
  target += normal_lpdf(z_Group_muPatient[1] | 0, 1);
  target += normal_lpdf(z_Condition_muPatient[1] | 0, 1);
  target += normal_lpdf(z_Condition_muPatientpsych[1] | 0, 1);
  target += normal_lpdf(z_Group_muPatientpsych[1] | 0, 1);
  target += normal_lpdf(z_Condition_muSDOH[1] | 0, 1);
  target += normal_lpdf(z_Group_muSDOH[1] | 0, 1);
  target += normal_lpdf(z_Condition_muCanmeds[1] | 0, 1);
  target += normal_lpdf(z_Group_muCanmeds[1] | 0, 1);
  target += normal_lpdf(z_Condition_muExpertise[1] | 0, 1);
  target += normal_lpdf(z_Group_muExpertise[1] | 0, 1);
  target += normal_lpdf(z_Condition_muRecommendations[1] | 0, 1);
  target += normal_lpdf(z_Group_muRecommendations[1] | 0, 1);
  // likelihood including all constants
  if (!prior_only) {
    for (n in 1:N) {
      target += multinomial_logit_lpmf(Coded[n] | mu[n]);
    }
  }
    }
generated quantities {
  // actual population-level intercept
  real b_muBuilding_Intercept = Intercept_muBuilding - dot_product(means_X, b_muBuilding);
  // actual population-level intercept
  real b_muCaregiver_Intercept = Intercept_muCaregiver - dot_product(means_X, b_muCaregiver);
  // actual population-level intercept
  real b_muPatient_Intercept = Intercept_muPatient - dot_product(means_X, b_muPatient);
  // actual population-level intercept
  real b_muPatientpsych_Intercept = Intercept_muPatientpsych - dot_product(means_X, b_muPatientpsych);
  // actual population-level intercept
  real b_muSDOH_Intercept = Intercept_muSDOH - dot_product(means_X, b_muSDOH);
  // actual population-level intercept
  real b_muCanmeds_Intercept = Intercept_muCanmeds - dot_product(means_X, b_muCanmeds);
  // actual population-level intercept
  real b_muExpertise_Intercept = Intercept_muExpertise - dot_product(means_X, b_muExpertise);
  // actual population-level intercept
  real b_muRecommendations_Intercept = Intercept_muRecommendations - dot_product(means_X, b_muRecommendations);
}

The model converges with following in ~12hrs. Much faster than the 9 days it was taking previously!

What_Stan_Fit <- sampling(What_Stan_Model,         # Model to sample from
                         data = What_Stan_Data,   # named list of data
                         iter = 2000,            # total number of iterations per chain
                         warmup = 1000,          # number of warmup iterations per chain
                         init = 0,               # set intial values to 0
                         chains = 2,             # number of Markov chains
                         cores = ncores,         # number of cores to use (1 per chain)
                         control=list(max_treedepth = 14, 
                                      adapt_delta = 0.9999)
)

Topic		Replies	Views
Hierarchical Multinomial Model taking days to run brms performance , multinomial-response , hierarchical-model	14	1590	April 9, 2020
Speeding up a hierarchical multinomial logit model Modeling	27	7745	October 18, 2022
Improving Hierarchical Model Efficiency Modeling rstan , fitting-issues , specification	7	536	February 14, 2023
Troubleshooting a "MaxDiff" hierarchical multinomial regression RStan	4	1877	April 17, 2020
Simple Multinomial Logistic Regression Performance Modeling performance	5	1009	September 30, 2020

Hierarchical Multinomial Model Optimization/Specification

Related topics