I’ve been trying to think about how you could rephrase models to use the normal_id_glm_lpdf
since I’ve seen it’s quite a bit faster now (and we can use the GPU stuff with it)
For brms, the following formula outputs the stan code below
fake_data = data.frame(blah = 1:10, goo = rep(c("A", "B"), 5), shoo = rep(1:2, 5))
make_stancode(blah ~ shoo + (shoo | goo), fake_data)
// generated with brms 2.10.3
functions {
}
data {
int<lower=1> N; // number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
vector[N] Z_1_2;
int<lower=1> NC_1; // number of group-level correlations
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // population-level effects
// temporary intercept for centered predictors
real Intercept;
real<lower=0> sigma; // residual SD
vector<lower=0>[M_1] sd_1; // group-level standard deviations
matrix[M_1, N_1] z_1; // standardized group-level effects
// cholesky factor of correlation matrix
cholesky_factor_corr[M_1] L_1;
}
transformed parameters {
// actual group-level effects
matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
// using vectors speeds up indexing in loops
vector[N_1] r_1_1 = r_1[, 1];
vector[N_1] r_1_2 = r_1[, 2];
}
model {
// initialize linear predictor term
vector[N] mu = Intercept + Xc * b;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
}
// priors including all constants
target += student_t_lpdf(Intercept | 3, 6, 10);
target += student_t_lpdf(sigma | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += student_t_lpdf(sd_1 | 3, 0, 10)
- 2 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(to_vector(z_1) | 0, 1);
target += lkj_corr_cholesky_lpdf(L_1 | 1);
// likelihood including all constants
if (!prior_only) {
target += normal_lpdf(Y | mu, sigma);
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
// group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// extract upper diagonal of correlation matrix
for (k in 1:M_1) {
for (j in 1:(k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
}
Below I’ve taken this and tried to smoosh some stuff around to get everything to use normal_id_glm_lpdf
and was wondering if it looks valid. If anyone can confirm, “This probably looks fine” I’ll sit down this week to check it can recover parameters the same as the current brms code.
Everywhere that I’ve changed things I put a DIFF
with a note saying what was different between the two
// generated with brms 2.10.3
functions {
}
data {
int<lower=1> N; // number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
vector[N] Z_1_2;
int<lower=1> NC_1; // number of group-level correlations
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
// DIFF
matrix[N, Kc + 2] all_x_stuff; // put Xc, Z_1_1, and Z_1_2 in one matrix
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
// DIFF: Dump everything into one matrix
all_x_stuff[, 1:Kc] = Xc;
all_x_stuff[, Kc + 1] = Z_1_1;
all_x_stuff[, Kc + 1] = Z_1_2;
}
parameters {
vector[Kc] b; // population-level effects
// temporary intercept for centered predictors
real Intercept;
real<lower=0> sigma; // residual SD
vector<lower=0>[M_1] sd_1; // group-level standard deviations
matrix[M_1, N_1] z_1; // standardized group-level effects
// cholesky factor of correlation matrix
cholesky_factor_corr[M_1] L_1;
}
transformed parameters {
// actual group-level effects
matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
// using vectors speeds up indexing in loops
vector[N_1] r_1_1 = r_1[, 1];
vector[N_1] r_1_2 = r_1[, 2];
// DIFF: Dump all the above into one big vector
vector[Kc + N_1 + N_1] all_b_stuff;
for (i in 1:Kc) {
all_b_stuff[i] = b[i];
}
for (i in 1:N_1) {
all_b_stuff[Kc + i] = r_1[i, 1];
}
for (i in 1:N_1) {
all_b_stuff[Kc + N_1 + i] = r_1[i, 2];
}
}
model {
// DIFF: Don't do the loop here
// priors including all constants
target += student_t_lpdf(Intercept | 3, 6, 10);
target += student_t_lpdf(sigma | 3, 0, 10)
- 1 * student_t_lccdf(0 | 3, 0, 10);
target += student_t_lpdf(sd_1 | 3, 0, 10)
- 2 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(to_vector(z_1) | 0, 1);
target += lkj_corr_cholesky_lpdf(L_1 | 1);
// likelihood including all constants
if (!prior_only) {
// DIFF: Just use normal_id_glm_lpdf instead of normal_lpdf
target += normal_id_glm_lpdf(Y | all_x_stuff, Intercept, all_b_stuff, sigma);
}
}
generated quantities {
// actual population-level intercept
// DIFF: subset by Kc part of all_b_stuff
real b_Intercept = Intercept - dot_product(means_X, all_b_stuff[1:Kc]);
// group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// extract upper diagonal of correlation matrix
for (k in 1:M_1) {
for (j in 1:(k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
}
It feels like I’m only shuffling around the data to the form I want (which was my intent), but I’m not sure if this is mathematically wonky.
Doing models like this though tends to be pretty fast for large N and M. I think the r_1
is wonk though. I have another scheme where just the fixed effects use normal_id_glm_lpdf
and the random effects are still done the way brms does them right now.
For the halfway scheme, you can do transformed params like
transformed parameters {
vector[N] r_1_sorted;
vector[N] r_2_sorted;
// actual group-level effects
matrix[N_1, M_1] r_1 = (diag_pre_multiply(sd_1, L_1) * z_1)';
// using vectors speeds up indexing in loops
vector[N_1] r_1_1 = r_1[, 1];
vector[N_1] r_1_2 = r_1[, 2];
// Sort everything here
for (n in 1:N) {
r_1_sorted[n] = r_1_1[J_1[n]];
r_2_sorted[n] = r_1_2[J_1[n]];
}
}
Then in the model block you can use normal_id_glm_lpdf
on the fixed effects and normal_lpdf
on the randoms
model {
vector[N] mu = r_1_sorted .* Z_1_1 + r_2_sorted .* Z_1_2;
// priors including all constants
target += student_t_lpdf(Intercept | 3, 3, 10);
target += student_t_lpdf(sigma | 3, 0, 10) - 1 * student_t_lccdf(0 | 3, 0, 10);
target += student_t_lpdf(sd_1 | 3, 0, 10) - 2 * student_t_lccdf(0 | 3, 0, 10);
target += normal_lpdf(to_vector(z_1) | 0, 1);
target += lkj_corr_cholesky_lpdf(L_1 | 1);
// likelihood including all constants
target += normal_id_glm_lpdf(Y | XQ, Intercept, bQ, sigma);
target += normal_lpdf(Y | mu, sigma);
}
I think that makes sense?
- Operating System: Ubuntu 18.04
- brms Version: 2.10.3