I have the two level data, where variables have some group independent correlations, and, additionally, group means are also correlated. Below you will find my R code to simulate such kind of the data:
When including Stan code in your post it really helps if you make it as readable as possible by using Stan code chunks (```stan) with clear spacing and indentation. For example, use
set.seed(666)
Omega1 <- rbind(c( 1, 0.3, 0.2), # correlations within groups
c(0.3, 1, 0.1),
c(0.2, 0.1, 1))
Omega2 <- rbind(c( 1, -0.3, 0.5), # correlations between groups
c(-0.3, 1, 0.2),
c( 0.5, 0.2, 1))
sigma1 <- c(0.8, 1, 1.3) # sigma within groups
sigma2 <- c(2, 3, 4) # sigma between groups
Sigma1 <- diag(sigma1) %*% Omega1 %*% diag(sigma1) # covariance within groups
Sigma2 <- diag(sigma2) %*% Omega2 %*% diag(sigma2) # covariance between groups
mu2 <-c(1,2,3) # average between groups
# Simulate means for each group
nGroups <- 6
mu1<-mvtnorm::rmvnorm(nGroups, mu2, Sigma2)
# Initialize arrays for simulated input
y <- c()
groupID <-c()
for (iGroup in 1:nGroups) {
t <- mvtnorm::rmvnorm(sample(4:6, 1), mu1[iGroup, ], Sigma1)
y <- rbind(y, t)
groupID <- c(groupID, rep(iGroup, nrow(t)))
}
# Scale y
y <- scale(y)
# Prepare data for STAN
mod.data <- list(
nRow = nrow(y),
nCol = ncol(y),
nGroups = nGroups,
y = y,
groupID = groupID
)
At the beginning I try to estimate only within-group correlation matrix. To do so I use the following STAN program:
data {
int<lower=0> nRow; // number of observations
int<lower=0> nCol; // dimension of observations
int<lower=0> nGroups; // number of groups
vector[nCol] y[nRow]; // Vector of values
int groupID[nRow]; // groupping index
}
parameters {
// Multivariate normal distribution parameters
vector[nCol] mu1[nGroups]; // means for each group
vector[nCol] mu2; // overall means
cholesky_factor_corr[nCol] L_Omega1; // correlations within groups
vector<lower=0>[nCol] sigma1; // sigma within groups
vector<lower=0>[nCol] sigma2; // sigma between groups
}
transformed parameters {
matrix[nCol,nCol] Sigma1; // covariance matrix
Sigma1 = diag_pre_multiply(sigma1,L_Omega1); // covariance within groups
}
model {
for (i in 1:nRow){
y[i] ~ multi_normal_cholesky(mu1[groupID[i]], Sigma1);
}
// Priors
for (i in 1:nGroups){
mu1[i] ~ normal(mu2,sigma2);
}
mu2 ~ normal(0, 1);
sigma1 ~ normal(0, 1);
sigma2 ~ normal(0, 1);
L_Omega1 ~ lkj_corr_cholesky(1);
}
Everything works fine till now. But when I try to implement multivariate normal for group mean I have divergent transitions. My STAN code for second model:
data {
int<lower=0> nRow; // number of observations
int<lower=0> nCol; // dimension of observations
int<lower=0> nGroups; // number of groups
vector[nCol] y[nRow]; // Vector of values
int groupID[nRow]; // groupping index
}
parameters {
// Multivariate normal distribution parameters
vector[nCol] mu1[nGroups]; // means for each group
vector[nCol] mu2; // overall means
cholesky_factor_corr[nCol] L_Omega1; // correlations within groups
cholesky_factor_corr[nCol] L_Omega2; // correlations between groups
vector<lower=0>[nCol] sigma1; // sigma within groups
vector<lower=0>[nCol] sigma2; // sigma between groups
}
transformed parameters {
matrix[nCol,nCol] Sigma1; // covariance matrix within groups
matrix[nCol,nCol] Sigma2; // covariance matrix between groups
Sigma1 = diag_pre_multiply(sigma1,L_Omega1); // covariance within groups
Sigma2 = diag_pre_multiply(sigma2,L_Omega2); // covariance between groups
}
model {
for (i in 1:nRow){
y[i] ~ multi_normal_cholesky(mu1[groupID[i]], Sigma1);
}
// Priors
for (i in 1:nGroups){
mu1[i] ~ multi_normal_cholesky(mu2, Sigma2);
}
mu2 ~ normal(0, 1);
sigma1 ~ normal(0, 1);
sigma2 ~ normal(0, 1);
L_Omega1 ~ lkj_corr_cholesky(1);
L_Omega2 ~ lkj_corr_cholesky(1);
}
As I understand their are two reasons: 1) I have too many variables and too few observations in each group, and 2) my parameterization is not well done.
I’ve Google for reparameterizations for multivariate normal distribution, but all examples, which I found, were for regressions…
Could you help me to change my STAN code?
I tried to create simulated data, but in case it will be important - the example of my real data is here.