I want to improve my Stan skills for more advanced modelling. I am now trying to “reverse engineering” the generated Stan code for a linear mixed effect model by the fabulous brms
package. I am mostly a Python user, so I am trying to feed in the data to this model from PyStan.
For example, this is the generated code for the sleep study, with random intercept and slope for each subject ( Reaction ~ Days + (Days | Subject)
):
// generated with brms 2.3.0
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> J_1[N];
int<lower=1> N_1;
int<lower=1> M_1;
vector[N] Z_1_1;
vector[N] Z_1_2;
int<lower=1> NC_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, K - 1] Xc; // centered version of X
vector[K - 1] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
real temp_Intercept; // temporary intercept
real<lower=0> sigma; // residual SD
vector<lower=0>[M_1] sd_1; // group-level standard deviations
matrix[M_1, N_1] z_1; // unscaled group-level effects
// cholesky factor of correlation matrix
cholesky_factor_corr[M_1] L_1;
}
transformed parameters {
// group-level effects
matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
vector[N_1] r_1_1 = r_1[, 1];
vector[N_1] r_1_2 = r_1[, 2];
}
model {
vector[N] mu = Xc * b + temp_Intercept;
for (n in 1:N) {
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
}
// priors including all constants
target += student_t_lpdf(temp_Intercept | 3, 289, 59);
target += student_t_lpdf(sigma | 3, 0, 59)
- 1 * student_t_lccdf(0 | 3, 0, 59);
target += student_t_lpdf(sd_1 | 3, 0, 59)
- 2 * student_t_lccdf(0 | 3, 0, 59);
target += normal_lpdf(to_vector(z_1) | 0, 1);
target += lkj_corr_cholesky_lpdf(L_1 | 1);
// likelihood including all constants
if (!prior_only) {
target += normal_lpdf(Y | mu, sigma);
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = temp_Intercept - dot_product(means_X, b);
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// take only relevant parts of correlation matrix
cor_1[1] = Cor_1[1,2];
}
The code is well commented, but what I do not fully understand is the part related to // data for group-level effects of ID 1
in the data
block. I think I understand the following:
-
N
is the number of observations (size ofReaction
) -
Y
is the vector of observations (Reaction
) -
K
is the number of fixed effects (2
because I have intercept and slope) -
X
is the design matrix (of size(N, K)
), where the first column is basically filled with1
s
Bu I am unsure about the rest:
-
J_1
I guess it is the the vector of sizeN
which maps each observation with the subject (something like[1, 1, 1, ..., 2, 2, 2, ...]
) -
M_1
I guess it is the number of random effects (2
, because I have the deviation on intercept and slope for each subject) -
Z_1_1
andZ_1_2
I thought they were part of the design matrixZ
for the random effects, each with size (n_{observations}, n_{subjects}). Instead, they are defined as vector of size n_{observations}. How are these defined? -
NC_1
, not sure about this.
Could you please enlighten me? :)