Parameterization of covariance matrices

Hi all,

I am using rstan to fit a bivariate multilevel VAR model. Since my model has two random intercepts, two random autoregressive parameters, and two random cross-regression parameters, my random effect covariance matrix is a 6 dimensional matrix, following a multivariate normal distribution.

I understand that the preferred way of parameterizing the covariance matrix in Stan is to use Cholesky decomposition since it is more efficient and numerical stable. I did obtain good recovery based on simulated data, following this parameterization. However, when I defined the covariance matrix as “cov_matrix[6] bcov” and passed it to “multi_normal(mu, bcov)”, and assigned an inverse Wishart prior to bcov, the random effect variances were overestimated. I checked the mean, median, and mode of the posterior samples, and they were similar, so the posterior samples were not skewed. The model has converged and the effective sample sizes were large (e.g., about 10k).

So I am just curious about why it is the case. I can provide more details and/or code if necessary.

Thanks in advance.

1 Like

Can you share example code? @bgoodri might know what is going on.

Sure. The relevant code is as follows.

data {
int<lower=1> P;    // number of persons
int<lower=1> T;         // number of time points
vector[2] Y[P,T];       // two dependent variables
matrix[6,6] W;       // the scale matrix of IW prior
}

parameters {
matrix[P,6] b; // random parameters
cov_matrix[6] bcov; // random effect covariance matrix
}

transformed parameters {
matrix[P,6] bmu; // means of random parameters
matrix[2,2] Sigma_VAR[P];     // random process noise covariance matrices
}

model{
vector[2] mus[P,T];
// IW prior
bcov ~ inv_wishart(6,W);

for (pp in 1:P){
b[pp,1:6] ~ multi_normal(bmu[pp,1:6], bcov);

// 1st observation
Y[pp, 1,1:2] ~ multi_normal(b[pp,1:2],Sigma_VAR[pp]);
// likelihood
for (tt in 2:T){
mus[pp,tt,1] = b[pp,1] + b[pp,3] * (Y[pp,tt-1,1] - b[pp,1]) + b[pp,4]*(Y[pp, tt-1,2] - b[pp,2]);
mus[pp,tt,2] = b[pp,2] + b[pp,5] * (Y[pp,tt-1,1] - b[pp,1]) + b[pp,6]*(Y[pp, tt-1,2] - b[pp,2]);
Y[pp,tt] ~ multi_normal(mus[pp,tt], Sigma_VAR[pp]); 
} //close loop over time points
} //close loop over persons
}

If I use LKJ priors, the code is like

parameters {
matrix[P,6] b; // random parameters
vector<lower=0>[6] sigma;
cholesky_factor_corr[6] L;
}

model{
vector[2] mus[P,T];
// LKJ prior
L ~ lkj_corr_cholesky(6);
sigma ~ cauchy(0, 10);

for (pp in 1:P){
b[pp,1:6] ~ multi_normal_cholesky(bmu[pp,1:6],diag_pre_multiply(sigma, L));

// 1st observation
Y[pp, 1,1:2] ~ multi_normal(b[pp,1:2],Sigma_VAR[pp]);
// likelihood
for (tt in 2:T){
mus[pp,tt,1] = b[pp,1] + b[pp,3] * (Y[pp,tt-1,1] - b[pp,1]) + b[pp,4]*(Y[pp, tt-1,2] - b[pp,2]);
mus[pp,tt,2] = b[pp,2] + b[pp,5] * (Y[pp,tt-1,1] - b[pp,1]) + b[pp,6]*(Y[pp, tt-1,2] - b[pp,2]);
Y[pp,tt] ~ multi_normal(mus[pp,tt], Sigma_VAR[pp]); 
} //close loop over time points
} //close loop over persons
}

Since our model is very complex, I did not provide code for calculating transformed parameters (e.g., bmu, Sigma_VAR), but I think the code above has explained my question.

Thanks!