# Non-centered parameterization of a hierarchical linear model

Hi, I’ve been trying to implement a non-centered parameterization of a hierarchical linear model, starting with simulated data. When I have a simple/intercept-only (i.e., no covariate) model, the non-centered parameterization runs fine. However, as soon as I introduce covariates I have thousands of divergences.

Here is my implementation of the non-centered parameterization of a hierarchical model:

// Index values, observations, and covariates.
data {
int<lower = 1> N;               // Number of observations.
int<lower = 1> K;               // Number of groups.
int<lower = 1> I;               // Number of observation-level covariates.

vector[N] y;                    // Vector of observations.
int<lower = 1, upper = K> g[N]; // Vector of group assignments.
matrix[N, I] X;                 // Matrix of observation-level covariates.
}

// Parameters and hyperparameters.
parameters {
real mu;                        // Mean of the population model.
real<lower = 0> tau;            // Variance of the population model.
matrix[K, I] Delta;             // Matrix of non-centered observation-level coefficients.
real<lower = 0> sigma;          // Variance of the likelihood.
}

// Deterministic transformation.
transformed parameters {
// Matrix of centered observation-level coefficients.
matrix[K, I] Beta;

// Non-centered parameterization.
for (k in 1:K) {
Beta[k,] = mu + tau * Delta[k,];
}
}

// Hierarchical regression.
model {
// Hyperpriors.
mu ~ normal(0, 5);
tau ~ normal(0, 5);

// Prior.
sigma ~ normal(0, 5);

// Non-centered population model and likelihood.
for (k in 1:K) {
Delta[k,] ~ normal(0, 1);
}
for (n in 1:N) {
y[n] ~ normal(X[n,] * Beta[g[n],]', sigma);
}
}

// Quantities conditioned on parameter draws.
generated quantities {
// Log likelihood to estimate loo.
vector[N] log_lik;
for (n in 1:N) {
log_lik[n] = normal_lpdf(y[n] | X[n,] * Beta[g[n],]', sigma);
}
}


Here is how I’ve been generating data:

// Index values, covariates, and hyperparameter values.
data {
int<lower = 1> N;               // Number of observations.
int<lower = 1> K;               // Number of groups.
int<lower = 1> I;               // Number of observation-level covariates.

int<lower = 1, upper = K> g[N]; // Vector of group assignments.
matrix[N, I] X;                 // Matrix of observation-level covariates.
real mu;                        // Mean of the population model.
real<lower = 0> tau;            // Variance of the population model.
real<lower = 0> sigma;          // Variance of the likelihood.
}

// Generate data according to the hierarchical regression.
generated quantities {
vector[N] y;                    // Vector of observations.
matrix[K, I] Beta;              // Matrix of group-level coefficients.

// Draw parameter values and generate data.
for (k in 1:K) {
for (i in 1:I) {
Beta[k, i] = normal_rng(mu, tau);
}
}
for (n in 1:N) {
y[n] = normal_rng(X[n,] * Beta[g[n],]', sigma);
}
}


where:

# Specify data and hyperparameter values.
sim_values <- list(
N = 500,                            # Number of observations.
K = 5,                              # Number of groups.
I = 7,                              # Number of observation-level covariates.
g = sample(5, 500, replace = TRUE), # Vector of group assignments.

# Matrix of observation-level covariates.
X = cbind(
rep(1, 500),
matrix(runif(500 * (7 - 1), min = 1, max = 10), nrow = 500)
),

mu = 8,
tau = 3,                            # Variance of the population model.
sigma = 10                          # Variance of the likelihood.
)


Any help would be greatly appreciated. Thanks!

Hi Marc! Welcome to the Stan forums! :)

From quickly glancing over your model it seems that you’d need to change the prior on Beta to a multivariate normal, i.e. with a \mu and \tau for each covariate and a correlation structure. In a non-centered parameterization this could look like this:

data{...same...}
parameters {
vector[I] mu;                        // Mean of the population model.
vector<lower = 0>[I] tau;            // Variance of the population model.
cholesky_factor_corr[I] L_Omega;     // Cholesky factor of correlation matrix
vector[I] Delta[K];                  // array of vectors is easier to deal with here, I think
real<lower = 0> sigma;               // Variance of the likelihood.
}
transformed parameters {
// Matrix of centered observation-level coefficients.
vector[I] Beta[K];

// Non-centered parameterization.
for (k in 1:K) {
Beta[k] = mu + diag_pre_multiply(tau, L_Omega) * Delta[k]; // diag_pre_multiply is more efficient
}

}
model {
vector[N] mu; // pre-compute mu

for (n in 1:N){
mu[n] = X[n,] * Beta[g[n]]; // g[n] indexes the column vector Beta[g[n]]
}

// Hyperpriors.
mu ~ normal(0, 5); // these are vectorized
tau ~ normal(0, 5); // these are vectorized
L_Omega ~ lkj_corr_cholesky(4); // add prior for L_Omega

// Prior.
sigma ~ normal(0, 5);

// Non-centered population model and likelihood.
for (k in 1:K){
Delta[k] ~ std_normal();
}
y ~ normal(mu, sigma); // now vectorized, should be a bit faster
}


I didn’t run this and only typed it together here, so you definitely have to double check. Let me know if it works or not or if you have a question about what’s going on.

Cheers!
Max

2 Likes

Thanks, Max. I finally got back around to it. I had planned on including a multivariate upper-level model at some point, but I hadn’t thought about not having a multivariate upper-level model being related to the non-centered parameterization failing. That was the missing piece!

I’ve finished documenting my walkthrough, if you’re interested. Thanks again for your help!

1 Like

Cool! Very nice blog post!

1 Like