Understand how `brms` parametrizes random effects


#1

I want to improve my Stan skills for more advanced modelling. I am now trying to “reverse engineering” the generated Stan code for a linear mixed effect model by the fabulous brms package. I am mostly a Python user, so I am trying to feed in the data to this model from PyStan.

For example, this is the generated code for the sleep study, with random intercept and slope for each subject ( Reaction ~ Days + (Days | Subject) ):

// generated with brms 2.3.0
functions { 
} 
data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K;  // number of population-level effects 
  matrix[N, K] X;  // population-level design matrix 
  // data for group-level effects of ID 1
  int<lower=1> J_1[N];
  int<lower=1> N_1;
  int<lower=1> M_1;
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  int<lower=1> NC_1;
  int prior_only;  // should the likelihood be ignored? 
} 
transformed data { 
  int Kc = K - 1; 
  matrix[N, K - 1] Xc;  // centered version of X 
  vector[K - 1] means_X;  // column means of X before centering 
  for (i in 2:K) { 
    means_X[i - 1] = mean(X[, i]); 
    Xc[, i - 1] = X[, i] - means_X[i - 1]; 
  } 
} 
parameters { 
  vector[Kc] b;  // population-level effects 
  real temp_Intercept;  // temporary intercept 
  real<lower=0> sigma;  // residual SD 
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // unscaled group-level effects
  // cholesky factor of correlation matrix
  cholesky_factor_corr[M_1] L_1;
} 
transformed parameters { 
  // group-level effects 
  matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
  vector[N_1] r_1_1 = r_1[, 1];
  vector[N_1] r_1_2 = r_1[, 2];
} 
model { 
  vector[N] mu = Xc * b + temp_Intercept; 
  for (n in 1:N) { 
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
  } 
  // priors including all constants 
  target += student_t_lpdf(temp_Intercept | 3, 289, 59); 
  target += student_t_lpdf(sigma | 3, 0, 59)
    - 1 * student_t_lccdf(0 | 3, 0, 59); 
  target += student_t_lpdf(sd_1 | 3, 0, 59)
    - 2 * student_t_lccdf(0 | 3, 0, 59); 
  target += normal_lpdf(to_vector(z_1) | 0, 1);
  target += lkj_corr_cholesky_lpdf(L_1 | 1); 
  // likelihood including all constants 
  if (!prior_only) { 
    target += normal_lpdf(Y | mu, sigma); 
  } 
} 
generated quantities { 
  // actual population-level intercept 
  real b_Intercept = temp_Intercept - dot_product(means_X, b); 
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // take only relevant parts of correlation matrix
  cor_1[1] = Cor_1[1,2]; 
} 

The code is well commented, but what I do not fully understand is the part related to // data for group-level effects of ID 1 in the data block. I think I understand the following:

  • N is the number of observations (size of Reaction)
  • Y is the vector of observations (Reaction)
  • K is the number of fixed effects (2 because I have intercept and slope)
  • X is the design matrix (of size (N, K)), where the first column is basically filled with 1s

Bu I am unsure about the rest:

  • J_1 I guess it is the the vector of size N which maps each observation with the subject (something like [1, 1, 1, ..., 2, 2, 2, ...])
  • M_1 I guess it is the number of random effects (2, because I have the deviation on intercept and slope for each subject)
  • Z_1_1 and Z_1_2 I thought they were part of the design matrix Z for the random effects, each with size (n_{observations}, n_{subjects}). Instead, they are defined as vector of size n_{observations}. How are these defined?
  • NC_1, not sure about this.

Could you please enlighten me? :)


Recover correlation between random effects from posterior samples
#2

Take a look at str(brms::make_standata(Reaction ~ Days + (Days | Subject), data = sleepstudy)) to check your understanding of what all the things in the data block are.


#3

Thank you, that helped a lot! Just a note, the command to type is str(brms::make_standata(Reaction ~ Days + (Days | Subject), data = sleepstudy)) . And my intuition was almost right. The Z_1_1 and Z_1_2 vectors are basically the columns of the design matrix X. I ran the code and the estimation is correct.


#4

This is probably a trivial question but I did some googling and I don’t understand why there is a temp_intercept
and then the real_intercept is: real b_Intercept = temp_Intercept - dot_product(means_X, b);.

Why do you take the dot product of the vector of X column-means and the vector of b coefficients?


#5

This has nothing to do with random effects and you don’t need to worry about it. I explain this transformation in ?brmsformula.


#6

Thanks Paul! Sorry I should have asked a new question / posted on stack overflow instead.