Brms sampling in multi-level multinomial models

Dear all,
Thanks for your quick replies. @JLC, thanks for sharing this amazing thread and @storopoli thanks for showing many possible improvements :-) Sorry for not getting back to you earlier, but I wanted to try your suggestions before replying. I will go through the generated Stan code block-wise and indicate what I changed/want to change. The code was generated with:

make_stancode(
  emotion ~ 0 + RC1 + RC2 + RC3 + RC4 + RC5 + RC6 + RC7 + (0 + RC1 + RC2 + RC3 + RC4 + RC5 + RC6 + RC7 | sex + country:language + speaker),
  data = dat,
  family = categorical(link = "logit"),
  prior = set_prior("normal(0,1)", class = "b")
)

Functions

// generated with brms 2.14.0
functions {
  /* turn a vector into a matrix of defined dimension
   * stores elements in row major order
   * Args:
   *   X: a vector
   *   N: first dimension of the desired matrix
   *   K: second dimension of the desired matrix
   * Returns:
   *   a matrix of dimension N x K
   */
  matrix as_matrix(vector X, int N, int K) {
    matrix[N, K] Y;
    for (i in 1:N) {
      Y[i] = to_row_vector(X[((i - 1) * K + 1):(i * K)]);
    }
    return Y;
  }
 /* compute correlated group-level effects
  * Args:
  *   z: matrix of unscaled group-level effects
  *   SD: vector of standard deviation parameters
  *   L: cholesky factor correlation matrix
  * Returns:
  *   matrix of scaled group-level effects
  */
  matrix scale_r_cor(matrix z, vector SD, matrix L) {
    // r is stored in another dimension order than z
    return transpose(diag_pre_multiply(SD, L) * z);
  }
}

It surprised me that we need scale_r_cor whereas this does not occur in @JLC 's thread. @JLC and @storopoli, do you know why correlated group-level effects are computed in this model, but not in the model in the linked thread? The reason is that in this model we compute group-level coefficients and not only group-level intercepts (when replacing the group-level coefficients with an intercept in make_stancode, the correlation matrixes disappear).

Data

data {
  int<lower=1> N;  // total number of observations
  int<lower=2> ncat;  // number of categories
  int Y[N];  // response variable
  int<lower=1> K_muANG;  // number of population-level effects
  matrix[N, K_muANG] X_muANG;  // population-level design matrix
  int<lower=1> K_muDIS;  // number of population-level effects
  matrix[N, K_muDIS] X_muDIS;  // population-level design matrix
  int<lower=1> K_muFER;  // number of population-level effects
  matrix[N, K_muFER] X_muFER;  // population-level design matrix
  int<lower=1> K_muHAP;  // number of population-level effects
  matrix[N, K_muHAP] X_muHAP;  // population-level design matrix
  int<lower=1> K_muSAD;  // number of population-level effects
  matrix[N, K_muSAD] X_muSAD;  // population-level design matrix
  int<lower=1> K_muSUR;  // number of population-level effects
  matrix[N, K_muSUR] X_muSUR;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_muANG_1;
  vector[N] Z_1_muANG_2;
  vector[N] Z_1_muANG_3;
  vector[N] Z_1_muANG_4;
  vector[N] Z_1_muANG_5;
  vector[N] Z_1_muANG_6;
  vector[N] Z_1_muANG_7;
  int<lower=1> NC_1;  // number of group-level correlations
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_muANG_1;
  vector[N] Z_2_muANG_2;
  vector[N] Z_2_muANG_3;
  vector[N] Z_2_muANG_4;
  vector[N] Z_2_muANG_5;
  vector[N] Z_2_muANG_6;
  vector[N] Z_2_muANG_7;
  int<lower=1> NC_2;  // number of group-level correlations
  // data for group-level effects of ID 3
  int<lower=1> N_3;  // number of grouping levels
  int<lower=1> M_3;  // number of coefficients per level
  int<lower=1> J_3[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_3_muANG_1;
  vector[N] Z_3_muANG_2;
  vector[N] Z_3_muANG_3;
  vector[N] Z_3_muANG_4;
  vector[N] Z_3_muANG_5;
  vector[N] Z_3_muANG_6;
  vector[N] Z_3_muANG_7;
  int<lower=1> NC_3;  // number of group-level correlations
  // data for group-level effects of ID 4
  int<lower=1> N_4;  // number of grouping levels
  int<lower=1> M_4;  // number of coefficients per level
  int<lower=1> J_4[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_4_muDIS_1;
  vector[N] Z_4_muDIS_2;
  vector[N] Z_4_muDIS_3;
  vector[N] Z_4_muDIS_4;
  vector[N] Z_4_muDIS_5;
  vector[N] Z_4_muDIS_6;
  vector[N] Z_4_muDIS_7;
  int<lower=1> NC_4;  // number of group-level correlations
  // data for group-level effects of ID 5
  int<lower=1> N_5;  // number of grouping levels
  int<lower=1> M_5;  // number of coefficients per level
  int<lower=1> J_5[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_5_muDIS_1;
  vector[N] Z_5_muDIS_2;
  vector[N] Z_5_muDIS_3;
  vector[N] Z_5_muDIS_4;
  vector[N] Z_5_muDIS_5;
  vector[N] Z_5_muDIS_6;
  vector[N] Z_5_muDIS_7;
  int<lower=1> NC_5;  // number of group-level correlations
  // data for group-level effects of ID 6
  int<lower=1> N_6;  // number of grouping levels
  int<lower=1> M_6;  // number of coefficients per level
  int<lower=1> J_6[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_6_muDIS_1;
  vector[N] Z_6_muDIS_2;
  vector[N] Z_6_muDIS_3;
  vector[N] Z_6_muDIS_4;
  vector[N] Z_6_muDIS_5;
  vector[N] Z_6_muDIS_6;
  vector[N] Z_6_muDIS_7;
  int<lower=1> NC_6;  // number of group-level correlations
  // data for group-level effects of ID 7
  int<lower=1> N_7;  // number of grouping levels
  int<lower=1> M_7;  // number of coefficients per level
  int<lower=1> J_7[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_7_muFER_1;
  vector[N] Z_7_muFER_2;
  vector[N] Z_7_muFER_3;
  vector[N] Z_7_muFER_4;
  vector[N] Z_7_muFER_5;
  vector[N] Z_7_muFER_6;
  vector[N] Z_7_muFER_7;
  int<lower=1> NC_7;  // number of group-level correlations
  // data for group-level effects of ID 8
  int<lower=1> N_8;  // number of grouping levels
  int<lower=1> M_8;  // number of coefficients per level
  int<lower=1> J_8[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_8_muFER_1;
  vector[N] Z_8_muFER_2;
  vector[N] Z_8_muFER_3;
  vector[N] Z_8_muFER_4;
  vector[N] Z_8_muFER_5;
  vector[N] Z_8_muFER_6;
  vector[N] Z_8_muFER_7;
  int<lower=1> NC_8;  // number of group-level correlations
  // data for group-level effects of ID 9
  int<lower=1> N_9;  // number of grouping levels
  int<lower=1> M_9;  // number of coefficients per level
  int<lower=1> J_9[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_9_muFER_1;
  vector[N] Z_9_muFER_2;
  vector[N] Z_9_muFER_3;
  vector[N] Z_9_muFER_4;
  vector[N] Z_9_muFER_5;
  vector[N] Z_9_muFER_6;
  vector[N] Z_9_muFER_7;
  int<lower=1> NC_9;  // number of group-level correlations
  // data for group-level effects of ID 10
  int<lower=1> N_10;  // number of grouping levels
  int<lower=1> M_10;  // number of coefficients per level
  int<lower=1> J_10[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_10_muHAP_1;
  vector[N] Z_10_muHAP_2;
  vector[N] Z_10_muHAP_3;
  vector[N] Z_10_muHAP_4;
  vector[N] Z_10_muHAP_5;
  vector[N] Z_10_muHAP_6;
  vector[N] Z_10_muHAP_7;
  int<lower=1> NC_10;  // number of group-level correlations
  // data for group-level effects of ID 11
  int<lower=1> N_11;  // number of grouping levels
  int<lower=1> M_11;  // number of coefficients per level
  int<lower=1> J_11[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_11_muHAP_1;
  vector[N] Z_11_muHAP_2;
  vector[N] Z_11_muHAP_3;
  vector[N] Z_11_muHAP_4;
  vector[N] Z_11_muHAP_5;
  vector[N] Z_11_muHAP_6;
  vector[N] Z_11_muHAP_7;
  int<lower=1> NC_11;  // number of group-level correlations
  // data for group-level effects of ID 12
  int<lower=1> N_12;  // number of grouping levels
  int<lower=1> M_12;  // number of coefficients per level
  int<lower=1> J_12[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_12_muHAP_1;
  vector[N] Z_12_muHAP_2;
  vector[N] Z_12_muHAP_3;
  vector[N] Z_12_muHAP_4;
  vector[N] Z_12_muHAP_5;
  vector[N] Z_12_muHAP_6;
  vector[N] Z_12_muHAP_7;
  int<lower=1> NC_12;  // number of group-level correlations
  // data for group-level effects of ID 13
  int<lower=1> N_13;  // number of grouping levels
  int<lower=1> M_13;  // number of coefficients per level
  int<lower=1> J_13[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_13_muSAD_1;
  vector[N] Z_13_muSAD_2;
  vector[N] Z_13_muSAD_3;
  vector[N] Z_13_muSAD_4;
  vector[N] Z_13_muSAD_5;
  vector[N] Z_13_muSAD_6;
  vector[N] Z_13_muSAD_7;
  int<lower=1> NC_13;  // number of group-level correlations
  // data for group-level effects of ID 14
  int<lower=1> N_14;  // number of grouping levels
  int<lower=1> M_14;  // number of coefficients per level
  int<lower=1> J_14[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_14_muSAD_1;
  vector[N] Z_14_muSAD_2;
  vector[N] Z_14_muSAD_3;
  vector[N] Z_14_muSAD_4;
  vector[N] Z_14_muSAD_5;
  vector[N] Z_14_muSAD_6;
  vector[N] Z_14_muSAD_7;
  int<lower=1> NC_14;  // number of group-level correlations
  // data for group-level effects of ID 15
  int<lower=1> N_15;  // number of grouping levels
  int<lower=1> M_15;  // number of coefficients per level
  int<lower=1> J_15[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_15_muSAD_1;
  vector[N] Z_15_muSAD_2;
  vector[N] Z_15_muSAD_3;
  vector[N] Z_15_muSAD_4;
  vector[N] Z_15_muSAD_5;
  vector[N] Z_15_muSAD_6;
  vector[N] Z_15_muSAD_7;
  int<lower=1> NC_15;  // number of group-level correlations
  // data for group-level effects of ID 16
  int<lower=1> N_16;  // number of grouping levels
  int<lower=1> M_16;  // number of coefficients per level
  int<lower=1> J_16[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_16_muSUR_1;
  vector[N] Z_16_muSUR_2;
  vector[N] Z_16_muSUR_3;
  vector[N] Z_16_muSUR_4;
  vector[N] Z_16_muSUR_5;
  vector[N] Z_16_muSUR_6;
  vector[N] Z_16_muSUR_7;
  int<lower=1> NC_16;  // number of group-level correlations
  // data for group-level effects of ID 17
  int<lower=1> N_17;  // number of grouping levels
  int<lower=1> M_17;  // number of coefficients per level
  int<lower=1> J_17[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_17_muSUR_1;
  vector[N] Z_17_muSUR_2;
  vector[N] Z_17_muSUR_3;
  vector[N] Z_17_muSUR_4;
  vector[N] Z_17_muSUR_5;
  vector[N] Z_17_muSUR_6;
  vector[N] Z_17_muSUR_7;
  int<lower=1> NC_17;  // number of group-level correlations
  // data for group-level effects of ID 18
  int<lower=1> N_18;  // number of grouping levels
  int<lower=1> M_18;  // number of coefficients per level
  int<lower=1> J_18[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_18_muSUR_1;
  vector[N] Z_18_muSUR_2;
  vector[N] Z_18_muSUR_3;
  vector[N] Z_18_muSUR_4;
  vector[N] Z_18_muSUR_5;
  vector[N] Z_18_muSUR_6;
  vector[N] Z_18_muSUR_7;
  int<lower=1> NC_18;  // number of group-level correlations
  int prior_only;  // should the likelihood be ignored?

Simplifying X

Replace

  int<lower=1> K_muANG;  // number of population-level effects
  matrix[N, K_muANG] X_muANG;  // population-level design matrix
  int<lower=1> K_muDIS;  // number of population-level effects
  matrix[N, K_muDIS] X_muDIS;  // population-level design matrix
  int<lower=1> K_muFER;  // number of population-level effects
  matrix[N, K_muFER] X_muFER;  // population-level design matrix
  int<lower=1> K_muHAP;  // number of population-level effects
  matrix[N, K_muHAP] X_muHAP;  // population-level design matrix
  int<lower=1> K_muSAD;  // number of population-level effects
  matrix[N, K_muSAD] X_muSAD;  // population-level design matrix
  int<lower=1> K_muSUR;  // number of population-level effects
  matrix[N, K_muSUR] X_muSUR;  // population-level design matrix

with

  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix

Since K will always be 7 (seven factors) and the design matrix (X) will be the same for all emotions.

Simplifying group-level effects

As described in the thread, we only need one Z per group-level effect and not for every emotion separately

  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_muANG_1;
  vector[N] Z_1_muANG_2;
  vector[N] Z_1_muANG_3;
  vector[N] Z_1_muANG_4;
  vector[N] Z_1_muANG_5;
  vector[N] Z_1_muANG_6;
  vector[N] Z_1_muANG_7;

ID 1 refers to the interaction between countries and languages (for simplicity, I’ll call it ‘culture’ here), ID 2 sex and ID 3 single speakers.

Since the group-level effects are computed for the same coefficients as the population effects, we can omit M_1, M_2 and M_3 since it will always be 7 (number of factors). So this part can be rewritten to:

  // data for group-level effects of culture
  int<lower=1> N_culture;  // number of cultures
  int<lower=1> J_culture[N];  // culture grouping indicator per observation
  // culture group-level predictor values
  vector[N] Z_culture_RC1;
  vector[N] Z_culture_RC2;
  vector[N] Z_culture_RC3;
  vector[N] Z_culture_RC4;
  vector[N] Z_culture_RC5;
  vector[N] Z_culture_RC6;
  vector[N] Z_culture_RC7;
  
  // data for group-level effects of sex
  int<lower=1> N_sex;  // number of sexes
  int<lower=1> J_sex[N];  // sex grouping indicator per observation
  // sex group-level predictor values
  vector[N] Z_sex_RC1;
  vector[N] Z_sex_RC2;
  vector[N] Z_sex_RC3;
  vector[N] Z_sex_RC4;
  vector[N] Z_sex_RC5;
  vector[N] Z_sex_RC6;
  vector[N] Z_sex_RC7;

  // data for group-level effects of speaker
  int<lower=1> N_speaker;  // number of speakers
  int<lower=1> J_speaker[N];  // speaker grouping indicator per observation
  // speaker group-level predictor values
  vector[N] Z_speaker_RC1;
  vector[N] Z_speaker_RC2;
  vector[N] Z_speaker_RC3;
  vector[N] Z_speaker_RC4;
  vector[N] Z_speaker_RC5;
  vector[N] Z_speaker_RC6;
  vector[N] Z_speaker_RC7;

@JLC and @storopoli, why do we need to define Z anyway? Isn’t this information already contained in X?

Simplifying NC

NC_1 to NC_18 is always 21 because this is the number of items in the upper diagonal of a correlation matrix of the size 7x7, so one parameter NC would suffice.