Non-centered parameterization of a hierarchical linear model

Hi, I’ve been trying to implement a non-centered parameterization of a hierarchical linear model, starting with simulated data. When I have a simple/intercept-only (i.e., no covariate) model, the non-centered parameterization runs fine. However, as soon as I introduce covariates I have thousands of divergences.

Here is my implementation of the non-centered parameterization of a hierarchical model:

// Index values, observations, and covariates.
data {
  int<lower = 1> N;               // Number of observations.
  int<lower = 1> K;               // Number of groups.
  int<lower = 1> I;               // Number of observation-level covariates.

  vector[N] y;                    // Vector of observations.
  int<lower = 1, upper = K> g[N]; // Vector of group assignments.
  matrix[N, I] X;                 // Matrix of observation-level covariates.
}

// Parameters and hyperparameters.
parameters {
  real mu;                        // Mean of the population model.
  real<lower = 0> tau;            // Variance of the population model.
  matrix[K, I] Delta;             // Matrix of non-centered observation-level coefficients.
  real<lower = 0> sigma;          // Variance of the likelihood.
}

// Deterministic transformation.
transformed parameters {
  // Matrix of centered observation-level coefficients.
  matrix[K, I] Beta;

  // Non-centered parameterization.
  for (k in 1:K) {
    Beta[k,] = mu + tau * Delta[k,];
  }
}

// Hierarchical regression.
model {
  // Hyperpriors.
  mu ~ normal(0, 5);
  tau ~ normal(0, 5);

  // Prior.
  sigma ~ normal(0, 5);

  // Non-centered population model and likelihood.
  for (k in 1:K) {
    Delta[k,] ~ normal(0, 1);
  }
  for (n in 1:N) {
    y[n] ~ normal(X[n,] * Beta[g[n],]', sigma);
  }
}

// Quantities conditioned on parameter draws.
generated quantities {
  // Log likelihood to estimate loo.
  vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = normal_lpdf(y[n] | X[n,] * Beta[g[n],]', sigma);
  }
} 

Here is how I’ve been generating data:

// Index values, covariates, and hyperparameter values.
data {
  int<lower = 1> N;               // Number of observations.
  int<lower = 1> K;               // Number of groups.
  int<lower = 1> I;               // Number of observation-level covariates.

  int<lower = 1, upper = K> g[N]; // Vector of group assignments.
  matrix[N, I] X;                 // Matrix of observation-level covariates.
  real mu;                        // Mean of the population model.
  real<lower = 0> tau;            // Variance of the population model.
  real<lower = 0> sigma;          // Variance of the likelihood.
}

// Generate data according to the hierarchical regression.
generated quantities {
  vector[N] y;                    // Vector of observations.
  matrix[K, I] Beta;              // Matrix of group-level coefficients.

  // Draw parameter values and generate data.
  for (k in 1:K) {
    for (i in 1:I) {
      Beta[k, i] = normal_rng(mu, tau);
    }
  }
  for (n in 1:N) {
    y[n] = normal_rng(X[n,] * Beta[g[n],]', sigma);
  }
}

where:

# Specify data and hyperparameter values.
sim_values <- list(
  N = 500,                            # Number of observations.
  K = 5,                              # Number of groups.
  I = 7,                              # Number of observation-level covariates.
  g = sample(5, 500, replace = TRUE), # Vector of group assignments.

  # Matrix of observation-level covariates.
  X = cbind(
    rep(1, 500),
    matrix(runif(500 * (7 - 1), min = 1, max = 10), nrow = 500)
  ),

  mu = 8,
  tau = 3,                            # Variance of the population model.
  sigma = 10                          # Variance of the likelihood.
)

Any help would be greatly appreciated. Thanks!

Hi Marc! Welcome to the Stan forums! :)

From quickly glancing over your model it seems that you’d need to change the prior on Beta to a multivariate normal, i.e. with a \mu and \tau for each covariate and a correlation structure. In a non-centered parameterization this could look like this:

data{...same...}
parameters {
  vector[I] mu;                        // Mean of the population model.
  vector<lower = 0>[I] tau;            // Variance of the population model.
  cholesky_factor_corr[I] L_Omega;     // Cholesky factor of correlation matrix
  vector[I] Delta[K];                  // array of vectors is easier to deal with here, I think
  real<lower = 0> sigma;               // Variance of the likelihood.
}
transformed parameters {
  // Matrix of centered observation-level coefficients.
  vector[I] Beta[K];

  // Non-centered parameterization.
  for (k in 1:K) {
    Beta[k] = mu + diag_pre_multiply(tau, L_Omega) * Delta[k]; // diag_pre_multiply is more efficient
  }

}
model {
  vector[N] mu; // pre-compute mu

  for (n in 1:N){
    mu[n] = X[n,] * Beta[g[n]]; // g[n] indexes the column vector Beta[g[n]]
  }

  // Hyperpriors.
  mu ~ normal(0, 5); // these are vectorized
  tau ~ normal(0, 5); // these are vectorized
  L_Omega ~ lkj_corr_cholesky(4); // add prior for L_Omega

  // Prior.
  sigma ~ normal(0, 5);

  // Non-centered population model and likelihood.
  for (k in 1:K){
    Delta[k] ~ std_normal();
  }
  y ~ normal(mu, sigma); // now vectorized, should be a bit faster
}

I didn’t run this and only typed it together here, so you definitely have to double check. Let me know if it works or not or if you have a question about what’s going on.

Cheers!
Max

2 Likes

Thanks, Max. I finally got back around to it. I had planned on including a multivariate upper-level model at some point, but I hadn’t thought about not having a multivariate upper-level model being related to the non-centered parameterization failing. That was the missing piece!

I’ve finished documenting my walkthrough, if you’re interested. Thanks again for your help!

1 Like

Cool! Very nice blog post!

1 Like