# Hierarchical prior (for partial pooling) on correlation matrices?

Does this work?

data {
int N;
row_vector bfi[4,N];
}

parameters {
real<lower=0,upper=1> alpha; //Pooling coefficient
cholesky_corr_matrix R_mu_L;
cholesky_corr_matrix R_diff_L;
}

transformed parameters {
corr_matrix R_mu = multiply_lower_tri_self_transpose(R_mu_L);
corr_matrix R_group;
for (i in 1:9) R_mu[i,i] = 1; // numerical correction
for(g in 1:4){
R_group[g] = (1 - alpha)*R_mu
+ alpha*multiply_lower_tri_self_transpose(R_diff_L[g]);
}
}

model {
R_mu_L ~ lkj_corr_cholesky(3);
alpha ~ beta(2,2);
for(g in 1:4){
R_diff_L[g] ~ lkj_corr_cholesky(1/alpha);
bfi[g] ~ multi_normal(rep_vector(0.0,9),R_group[g]);
}
}


I just tried something similar, but more specific to my case (where there are unequal numbers of correlations across groups/samples).

I estimated R_diff_L (three samples/groups) as a 9x9, even though 2 of the three groups will only use an inner 6x6. That worked.

So the issue comes when you want to augment a matrix.

You CAN:

1. Estimate 9x9 R_mu
2. Estimate three 9x9 R_diff matrices
3. Convex combine each of three R_diff with 9x9 R_mu
4. For 2 of the 3 samples, use the inner 6x6 matrix in multi_normal

You CANNOT:

1. Estimate 9x9 R_mu
2. Estimate 3 6x6 R_diff matrices
3. Convex combine each of the three R_diff with the inner 6x6 R_mu matrix
4. Use that 6x6 for the first 2 samples
5. Augment the 3rd sample’s 6x6 matrix to a 9x9 by using respective R_mu values; e.g., R_group_nine[7,8] = R_mu[7,8]

The latter produces non-positive definiteness.

I’ll check to see whether the results make sense. Is it a problem that the correlations in rows/columns 7-9 are estimated from R_mu and the three R_group matrices, even though only group 3 makes use of those correlations?

So, this revision does not work?

data {
int N;
row_vector bfi[4,N];
}

parameters {
real<lower=0,upper=1> alpha; //Pooling coefficient
cholesky_corr_matrix R_mu_L;
cholesky_corr_matrix R_diff_L;
cholesky_corr_matrix R_diff_L2;
}

transformed parameters {
corr_matrix R_mu = multiply_lower_tri_self_transpose(R_mu_L);
corr_matrix R_group;
corr_matrix R_group2 = (1 - alpha) * R_mu + alpha *
multiply_lower_tri_self_transpose(R_diff_L2);
for(g in 1:3){
R_group[g] = (1 - alpha)*R_mu[1:6,1:6]
+ alpha*multiply_lower_tri_self_transpose(R_diff_L[g]);
}

}


That worked.

Ok, so adding to the can list; for future self:

CAN +=

1. Estimate 9x9 R_mu
2. Estimate 1 9x9 R_diff matrices; and 2 6x6 R_diff matrices
3. Convex combine 9x9 with R_mu
4. Convex combine 6x6 matrices with R_mu[1:6,1:6]

If I did estimate all 9x9 R_diff matrices, even though for two, only the inner 6x6 is used, does that affect the estimates?
I know it’s inefficient, because it’s needless sampling.
However, it would be /cleaner/, for post-processing reasons.

With proper priors, it is fine.

1 Like

Thank you for all your input - Seriously, a life saver.

Also for future me (or anyone who wants to do this simply), I wrote two functions that helps compute the per-group correlation matrices (or choleskies).
It takes a global matrix, a local diff matrix, and alpha.
If the local matrix is smaller than the global matrix, then it uses the inner-most subset from the global matrix.
E.g., if you have a 9x9 global matrix, but a 6x6 local matrix, then it will use global[1:6,1:6]; this means the function requires that the local matrix is literally a top-right-corner subset of the global matrix.

	matrix convex_combine_cholesky(matrix global_chol_cor, matrix local_chol_cor, real alpha){
int dim = rows(local_chol_cor);
int global_dim = rows(global_chol_cor);
matrix[global_dim,global_dim] global_cor = multiply_lower_tri_self_transpose(global_chol_cor);
matrix[dim,dim] local_cor = multiply_lower_tri_self_transpose(local_chol_cor);
matrix[dim,dim] global_cor_use;
matrix[dim,dim] combined_chol_cor;

if(dim < rows(global_cor)){
global_cor_use = global_cor[1:dim,1:dim];
} else {
global_cor_use = global_cor;
}
combined_chol_cor = cholesky_decompose((1 - alpha)*global_cor_use + (alpha)*local_cor);
return(combined_chol_cor);
}

matrix convex_combine(matrix global_cor, matrix local_cor, real alpha){
int dim = rows(local_cor);
matrix[dim,dim] global_cor_use;
matrix[dim,dim] combined_cor;
if(dim < rows(global_cor)){
global_cor_use = global_cor[1:dim,1:dim];
} else {
global_cor_use = global_cor;
}
combined_cor = (1 - alpha)*global_cor_use + alpha*local_cor;
return(combined_cor);
}


To use it, you need to declare a global matrix, which should be the same dimension as the largest group matrix; group-specific ‘diff’ matrices; and alpha.
It looks like this:

parameters {
cholesky_factor_corr R_global_L;
cholesky_factor_corr R_diff_L;
real<lower=0,upper=1> alpha;
}
transformed parameters {
cholesky_factor_corr R_group_L;
for(g in 1:4){
R_group_L[g] = convex_combine_cholesky(R_global_L,R_diff_L[g],alpha);
}
}
model {
alpha ~ beta(2,2)
R_global_L ~ lkj_corr_cholesky(3);
for(g in 1:4){
R_diff_L[g] ~ lkj_corr_cholesky(1/alpha);
data[g] ~ multi_normal_cholesky(muVector, R_group_L[g]);
}

}
4 Likes

The problem is that you were using \BoldSymbol or something like that. I switched to \mathbf so it’d render.

Hi there,

I think this pooling coefficient thing is very interesting. I’m wondering if I can adapt it when parameterizing hierarchical prior on the covariance matrix. See the following Stan code:

data {
int<lower=1> J; // number of subjects
int<lower=1> K; // number of classes
}

parameters {
real<lower=0, upper=1> alpha; // pooling coefficient
vector<lower=0> sigma_u_mu;  // class mean random effects
vector<lower=0> sigma_u_diff[K]; // class deviations from mean random effect means
cholesky_factor_corr L_Omega_mu; // Mean Choleskey decomposition of correlation matrix
cholesky_factor_corr L_Omega_diff[K]; // class deviations from mean Choleskey decomposition of correlation matrix
corr_matrix Omega_mu; // class mean correlation matrix
corr_matrix Omega_diff[K]; // class deviations from mean correlation matrix
matrix [2, J] z_u[K]; // uncorrelated (standardized) random effects
matrix [2, 2] L_mu; // class mean decomposition of covariance matrix
matrix [2, 2] L_diff[K]; // class deviations from mean decomposition of covariance matrix
}

transformed parameters {
matrix[2, J] u[K]; // correlated (true) random effects
for (k in 1:K) {
u[k] = ((1-alpha) * diag_pre_multiply(sigma_u_mu,  L_Omega_mu)
+ alpha * diag_pre_multiply(sigma_u_diff[k],  L_Omega_diff[k])) * z_u[k]; // can I do this?
}
}


Specifically, can I adapt the idea of pooling coefficient to the Cholesky decomposition of the covariance matrix, and then use non-centered parameterization to get valid random effects.

Thank you!!!