Understand how `brms` parametrizes random effects

I want to improve my Stan skills for more advanced modelling. I am now trying to “reverse engineering” the generated Stan code for a linear mixed effect model by the fabulous brms package. I am mostly a Python user, so I am trying to feed in the data to this model from PyStan.

For example, this is the generated code for the sleep study, with random intercept and slope for each subject ( Reaction ~ Days + (Days | Subject) ):

// generated with brms 2.3.0
functions { 
} 
data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K;  // number of population-level effects 
  matrix[N, K] X;  // population-level design matrix 
  // data for group-level effects of ID 1
  int<lower=1> J_1[N];
  int<lower=1> N_1;
  int<lower=1> M_1;
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  int<lower=1> NC_1;
  int prior_only;  // should the likelihood be ignored? 
} 
transformed data { 
  int Kc = K - 1; 
  matrix[N, K - 1] Xc;  // centered version of X 
  vector[K - 1] means_X;  // column means of X before centering 
  for (i in 2:K) { 
    means_X[i - 1] = mean(X[, i]); 
    Xc[, i - 1] = X[, i] - means_X[i - 1]; 
  } 
} 
parameters { 
  vector[Kc] b;  // population-level effects 
  real temp_Intercept;  // temporary intercept 
  real<lower=0> sigma;  // residual SD 
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // unscaled group-level effects
  // cholesky factor of correlation matrix
  cholesky_factor_corr[M_1] L_1;
} 
transformed parameters { 
  // group-level effects 
  matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
  vector[N_1] r_1_1 = r_1[, 1];
  vector[N_1] r_1_2 = r_1[, 2];
} 
model { 
  vector[N] mu = Xc * b + temp_Intercept; 
  for (n in 1:N) { 
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
  } 
  // priors including all constants 
  target += student_t_lpdf(temp_Intercept | 3, 289, 59); 
  target += student_t_lpdf(sigma | 3, 0, 59)
    - 1 * student_t_lccdf(0 | 3, 0, 59); 
  target += student_t_lpdf(sd_1 | 3, 0, 59)
    - 2 * student_t_lccdf(0 | 3, 0, 59); 
  target += normal_lpdf(to_vector(z_1) | 0, 1);
  target += lkj_corr_cholesky_lpdf(L_1 | 1); 
  // likelihood including all constants 
  if (!prior_only) { 
    target += normal_lpdf(Y | mu, sigma); 
  } 
} 
generated quantities { 
  // actual population-level intercept 
  real b_Intercept = temp_Intercept - dot_product(means_X, b); 
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // take only relevant parts of correlation matrix
  cor_1[1] = Cor_1[1,2]; 
} 

The code is well commented, but what I do not fully understand is the part related to // data for group-level effects of ID 1 in the data block. I think I understand the following:

  • N is the number of observations (size of Reaction)
  • Y is the vector of observations (Reaction)
  • K is the number of fixed effects (2 because I have intercept and slope)
  • X is the design matrix (of size (N, K)), where the first column is basically filled with 1s

Bu I am unsure about the rest:

  • J_1 I guess it is the the vector of size N which maps each observation with the subject (something like [1, 1, 1, ..., 2, 2, 2, ...])
  • M_1 I guess it is the number of random effects (2, because I have the deviation on intercept and slope for each subject)
  • Z_1_1 and Z_1_2 I thought they were part of the design matrix Z for the random effects, each with size (n_{observations}, n_{subjects}). Instead, they are defined as vector of size n_{observations}. How are these defined?
  • NC_1, not sure about this.

Could you please enlighten me? :)

2 Likes

Take a look at str(brms::make_standata(Reaction ~ Days + (Days | Subject), data = sleepstudy)) to check your understanding of what all the things in the data block are.

4 Likes

Thank you, that helped a lot! Just a note, the command to type is str(brms::make_standata(Reaction ~ Days + (Days | Subject), data = sleepstudy)) . And my intuition was almost right. The Z_1_1 and Z_1_2 vectors are basically the columns of the design matrix X. I ran the code and the estimation is correct.

This is probably a trivial question but I did some googling and I don’t understand why there is a temp_intercept
and then the real_intercept is: real b_Intercept = temp_Intercept - dot_product(means_X, b);.

Why do you take the dot product of the vector of X column-means and the vector of b coefficients?

This has nothing to do with random effects and you don’t need to worry about it. I explain this transformation in ?brmsformula.

Thanks Paul! Sorry I should have asked a new question / posted on stack overflow instead.

Since I feel that my question is very much related to this thread’s title, I’ll take another try on posting another question here. I have a feeling I am missing something super basic.

Specifically, what I am confused about:
In a model like
red ~ 0 + (1|fspc)

what we would get with get_prior would be:
Screenshot%20from%202019-01-22%2022-25-23

I understood via this post that line 1 is a header and tells us that lines 2 and 3 have the same prior specified in line 1.
But then I can’t seem to figure out what line 2 here is specifying - I would only expect lines 3 and 4 to pop up.

Given that priors are not changed from default, I would use the following notation for this model:

red_n ~ Normal(mu_n,sigma)
sigma ~ t(3,0,10)
mu_n ~ a_i[n]
a_i[n] ~ Normal(0, sigma_a)
sigma_a ~ t(3,0,10)

Here, sigma_a corresponds to line 3.
But where in this formula would we find what is in line 2?
And whatever it is that is in line 2, if I see things correctly, it doesn’t pop up in the summary of that model once it is run.

I would be very thankful for a hint.
Is it simply that it starts with line 1 to declare the student_t for all sd parameters and then it goes through the levels until it finally reaches the one where it is concretely applied? I.e., I could change it only in line 2 which would propagate to line 3, or I could change it in line 3 specifically?

The way brms allows to specify priors is “hierarchical”. If nothing is specified on the lower levels, the upper level prior is used. That is if nothing is specified in the 3rd line, we look in the 2nd. If nothing is specified in the 2nd line, we look at the 1st. See also ?set_prior.