Understanding Multilevel modelling STAN code

Marimuthu · April 26, 2024, 6:09pm

[edit: escaped code]

Hello Everyone,

I am trying to understand the STAN code of mixed effect model from “brms” R package.

Here is the STAN code:

// generated with brms 2.20.4
functions {
  /* compute correlated group-level effects
   Args:
     z: matrix of unscaled group-level effects
    SD: vector of standard deviation parameters
     L: cholesky factor correlation matrix
   Returns:
    matrix of scaled group-level effects
  */
    matrix scale_r_cor(matrix z, vector SD, matrix L) {
      // r is stored in another dimension order than z
      return transpose(diag_pre_multiply(SD, L) * z);
    }
}

data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  int<lower=1> Kc;  // number of population-level effects after centering

  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  array[N] int<lower=1> J_1;  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  vector[N] Z_1_2;
  int<lower=1> NC_1;  // number of group-level correlations
  int prior_only;  // should the likelihood be ignored?
}

transformed data {
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
  }
}

parameters {
  vector[Kc] b;  // regression coefficients
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // dispersion parameter
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  matrix[M_1, N_1] z_1;  // standardized group-level effects
  cholesky_factor_corr[M_1] L_1;  // cholesky factor of correlation matrix
}

transformed parameters {
  matrix[N_1, M_1] r_1;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_1] r_1_1;
  vector[N_1] r_1_2;
  real lprior = 0;  // prior contributions to the log posterior

  // compute actual group-level effects
  r_1 = scale_r_cor(z_1, sd_1, L_1);
  r_1_1 = r_1[, 1];
  r_1_2 = r_1[, 2];
  lprior += student_t_lpdf(Intercept | 3, 288.7, 59.3);
  lprior += student_t_lpdf(sigma | 3, 0, 59.3)  - 1 * student_t_lccdf(0 | 3, 0, 59.3);
  lprior += student_t_lpdf(sd_1 | 3, 0, 59.3)   - 2 * student_t_lccdf(0 | 3, 0, 59.3);
  lprior += lkj_corr_cholesky_lpdf(L_1 | 1);
}

model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    mu += Intercept;
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
    }
    target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
  }
  // priors including constants
  target += lprior;
  target += std_normal_lpdf(to_vector(z_1));
}

generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);
  // compute group-level correlations
  corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
  vector<lower=-1,upper=1>[NC_1] cor_1;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_1) {
    for (j in 1:(k - 1)) {
      cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
    }
  }
}

I would really appreciate if anyone could explain the below code and provide reference if you have.

/* compute correlated group-level effects
  Args:
    z: matrix of unscaled group-level effects
    SD: vector of standard deviation parameters
    L: cholesky factor correlation matrix


 matrix scale_r_cor(matrix z, vector SD, matrix L) {
      // r is stored in another dimension order than z
      return **transpose(diag_pre_multiply(SD, L) * z)**;
    }

Many thanks for your help and precious time.

Marimuthu

Bob_Carpenter · April 26, 2024, 6:51pm

The following expression

transpose(diag_pre_multiply(SD, L) * z

The top-level story is that we’re working with covariance matrices that are parameterized by Cholesky factors of their correlation matrix and a vector of standard deviations. The above expression converts standard normal variates z to variates with covariance whose Cholesky factor is diag_matrix(SD) * L = diag_pre_multiply(SD, L). When you diagonally pre-multiply a Cholesky factor for correlation by a diagonal matrix of scales, you get the Cholesky factor for covariance. When you transpose and multiply by a vector of standard normals z, you get a variable y = transpose(diag_pre_multiply(SD, L) * z such that y ~ multi_normal(0, (diag(SD) * L) * (diag(SD) * L)'). That is, y has the covariance defined by the Cholesky factor.

For reference, I’d suggest the Stan User’s Guide chapter on regression, which goes over this parameterization in detail. This also goes over the LKJ prior we use on correlation matrices (which concentrates toward a unit matrix).

In terms of the Cholesky factor form of generating variables, Wikipedia covers that:

For more on the LKJ prior, see this paper of @bgoodri’s:

http://www.stat.columbia.edu/~gelman/research/unpublished/Visualization.pdf

Marimuthu · April 26, 2024, 7:24pm

Thank you, Bob. Your explanation has been very enlightening.

Marimuthu.

Topic		Replies	Views
Help understanding code and generating estimates General	9	946	April 25, 2020
Understand how `brms` parametrizes random effects brms	7	3131	February 10, 2019
Using design matrices in stan (based on brms code) brms	7	1507	January 4, 2021
Translating models from brms to Stan Modeling brms	0	276	March 15, 2024
Centered parametrization using stan brms code for varying slopes Modeling specification , brms	2	773	October 28, 2021

Understanding Multilevel modelling STAN code

Related topics