# Two-level correlation matrix reparametrization

I have the two level data, where variables have some group independent correlations, and, additionally, group means are also correlated. Below you will find my R code to simulate such kind of the data:

When including Stan code in your post it really helps if you make it as readable as possible by using Stan code chunks (```stan) with clear spacing and indentation. For example, use

``````set.seed(666)
Omega1 <- rbind(c( 1, 0.3, 0.2), # correlations within groups
c(0.3,   1, 0.1),
c(0.2, 0.1, 1))
Omega2 <- rbind(c(   1, -0.3, 0.5), # correlations between groups
c(-0.3,    1, 0.2),
c( 0.5,  0.2, 1))
sigma1 <- c(0.8, 1, 1.3) # sigma within groups
sigma2 <- c(2, 3, 4) # sigma between groups

Sigma1 <- diag(sigma1) %*% Omega1 %*% diag(sigma1) # covariance within groups
Sigma2 <- diag(sigma2) %*% Omega2 %*% diag(sigma2) # covariance between groups

mu2 <-c(1,2,3) # average between groups
# Simulate means for each group
nGroups <- 6
mu1<-mvtnorm::rmvnorm(nGroups, mu2, Sigma2)

# Initialize arrays for simulated input
y <- c()
groupID <-c()
for (iGroup in 1:nGroups) {
t <- mvtnorm::rmvnorm(sample(4:6, 1), mu1[iGroup, ], Sigma1)
y <- rbind(y, t)
groupID <- c(groupID, rep(iGroup, nrow(t)))
}
# Scale y

y <- scale(y)
# Prepare data for STAN
mod.data <- list(
nRow = nrow(y),
nCol = ncol(y),
nGroups = nGroups,
y = y,
groupID = groupID
)
``````

At the beginning I try to estimate only within-group correlation matrix. To do so I use the following STAN program:

``````data {
int<lower=0> nRow;     // number of observations
int<lower=0> nCol;     // dimension of observations
int<lower=0> nGroups; // number of groups

vector[nCol] y[nRow];  // Vector of values
int groupID[nRow];     // groupping index
}
parameters {
// Multivariate normal distribution parameters
vector[nCol] mu1[nGroups];  // means for each group
vector[nCol] mu2;           // overall means

cholesky_factor_corr[nCol] L_Omega1; // correlations within groups
vector<lower=0>[nCol] sigma1; // sigma within groups
vector<lower=0>[nCol] sigma2; // sigma between groups
}
transformed parameters {
matrix[nCol,nCol] Sigma1; // covariance matrix
Sigma1 = diag_pre_multiply(sigma1,L_Omega1); // covariance within groups
}
model {
for (i in 1:nRow){
y[i] ~ multi_normal_cholesky(mu1[groupID[i]], Sigma1);
}
// Priors
for (i in 1:nGroups){
mu1[i] ~ normal(mu2,sigma2);
}
mu2 ~ normal(0, 1);

sigma1 ~ normal(0, 1);
sigma2 ~ normal(0, 1);

L_Omega1 ~ lkj_corr_cholesky(1);
}
``````

Everything works fine till now. But when I try to implement multivariate normal for group mean I have divergent transitions. My STAN code for second model:

``````data {
int<lower=0> nRow;     // number of observations
int<lower=0> nCol;     // dimension of observations
int<lower=0> nGroups; // number of groups

vector[nCol] y[nRow];  // Vector of values
int groupID[nRow];     // groupping index
}
parameters {
// Multivariate normal distribution parameters
vector[nCol] mu1[nGroups];  // means for each group
vector[nCol] mu2;           // overall means

cholesky_factor_corr[nCol] L_Omega1; // correlations within groups
cholesky_factor_corr[nCol] L_Omega2; // correlations between groups
vector<lower=0>[nCol] sigma1; // sigma within groups
vector<lower=0>[nCol] sigma2; // sigma between groups
}
transformed parameters {
matrix[nCol,nCol] Sigma1; // covariance matrix within groups
matrix[nCol,nCol] Sigma2; // covariance matrix between groups
Sigma1 = diag_pre_multiply(sigma1,L_Omega1); // covariance within groups
Sigma2 = diag_pre_multiply(sigma2,L_Omega2); // covariance between groups
}
model {
for (i in 1:nRow){
y[i] ~ multi_normal_cholesky(mu1[groupID[i]], Sigma1);
}
// Priors
for (i in 1:nGroups){
mu1[i] ~ multi_normal_cholesky(mu2, Sigma2);
}
mu2 ~ normal(0, 1);

sigma1 ~ normal(0, 1);
sigma2 ~ normal(0, 1);

L_Omega1 ~ lkj_corr_cholesky(1);
L_Omega2 ~ lkj_corr_cholesky(1);
}
``````

As I understand their are two reasons: 1) I have too many variables and too few observations in each group, and 2) my parameterization is not well done.

Iâ€™ve Google for reparameterizations for multivariate normal distribution, but all examples, which I found, were for regressionsâ€¦

Could you help me to change my STAN code?

I tried to create simulated data, but in case it will be important - the example of my real data is here.

Hierarchical models like these can run into divergences pretty easily, try using the multivariate non-centered parameterisation for `Sigma2`, following the guide here: 24.7 Reparameterization | Stan Userâ€™s Guide

2 Likes