# Hierarchical prior (for partial pooling) on correlation matrices?

Oh wait; Is there a way to force positive definiteness?
Say I had a cholesky factor of a correlation matrix.
At [1,1], I simply put 1, 0s on the upper diagonal.

Hierarchically pool lower triangle. Constrain the diagonal to be sqrt(1 - sum(rowValues)^2).
Would that then produce a valid correlation matrix?

Not if sum(rowValues^2) > 1

1 Like

Right; I see (Just tried it; estimated fine, but had divergences - The parameters were recovered well though).

I can’t see exactly what you wrote above, because it says “Undefined control sequence \boldsymbol”

If you could remove the \boldsymbol, I’d be grateful, so I can better understand what you wrote.

I take it this is a sort of non-centered parameterization for cor matrices? Estimate a common matrix, estimate ‘difference’ matrices, and combine them to produce a group-specific matrix?

Ah I see; is \alpha a parameter, or is it fixed? I guess it’s basically just the scaling parameter, correct? As \alpha \rightarrow 1, the LKJ distribution approaches uniformity, and less pooling occurs. As \alpha \rightarrow 0, the LKJ prior is sharply peaked around identity, and full pooling basically occurs (Because all the weight is on \Lambda_0 ? Is that right?

And \Lambda_j is essentially a difference matrix? So it’s a bit like the non-centered random effect approach?

You may have to enable Javascript or something to see the \LaTeX, but yes it was just Lambda_j. You would declare real<lower=0,upper=1> alpha; in the parameters block and estimate it, perhaps with a Beta prior.

I’ll give it a shot. Thank you so much for your help. Stan discourse is the ultimate applied stats journal.

1 Like

Thank you @bgoodri
That worked strikingly well on a test case of mine.
It was also quite fast. With a corr_matrix \Lambda_0 and 4 corr_matrix \Lambda_j, it took a minute.
The code used:
BFI is just a list of 4 100x9 subsets from the psych::bfi dataset in R.

Are there any efficiency improvements available here? I assume cholesky correlation matrices do not have the convex combination property; I could still sample as cholesky, but I assume the act of transforming it to a correlation matrix, then perhaps back to a cholesky factor for multi_normal_cholesky would undo the efficiency benefits of using cholesky in the first place.

data {
int N;
row_vector bfi[4,N];
}

parameters {
real<lower=0,upper=1> alpha; //Pooling coefficient
corr_matrix R_mu;
corr_matrix R_diff;
}

transformed parameters {
corr_matrix R_group;
for(g in 1:4){
R_group[g] = (1 - alpha)*R_mu + alpha*R_diff[g];
}
}

model {
R_mu ~ lkj_corr(3);
alpha ~ beta(2,2);
for(g in 1:4){
R_diff[g] ~ lkj_corr(1/alpha);
bfi[g] ~ multi_normal(rep_vector(0.0,9),R_group[g]);
}

}


The convex combination thing has to be done on correlation matrices, but you can get it slightly faster and more numerically stable by declaring things as cholesky_factor_corr and forming the correlation matrices yourself with multiply_lower_tri_self_transpose. That doesn’t help with the multivariate normal part, but it does allow you to do ~ lkj_corr_cholesky(shape) rather than ~lkj_corr(shape) to avoid having to do a Cholesky decomposition in order to evaluate the determinant in the LKJ PDF (when shape is not 1).

One more question:

Again, 2 samples have a 6x6 cor matrix; one sample has 9x9 matrix, with the 6x6 embedded. I want the embedded 6x6 to be pooled along with the other 6x6 matrices toward an overall \Lambda_0. Does the convex combination property hold for subsets of correlation matrices as well?

E.g.,
Should there be a 9x9 \Lambda_0 (global matrix), and R_{j,1:6,1:6} = (1 - \alpha)\Lambda_{0,1:6,1:6} + \alpha \Lambda_j, where \Lambda_j is a 6x6 matrix? Then for sample 3, the last 3 columns/rows should just be whatever is in those cells for \Lambda_0?

Yes it holds for all submatrices of a correlation matrix.

I can’t get the approach to work. It throws positive definite errors for R_group_nine, which is the 9x9 correlation matrix for sample 3.

Essentially, I estimate a 9x9 global correlation matrix (cholesky), and 3 6x6 correlation matrices (cholesky; one for each sample).
I combine them using the function you mentioned, to produce R_group correlation matrices - The 6x6 pooled correlation matrix for each sample.
For samples 1 and 2, these are the correlations used in multi_normal().
For sample 3, I embed those values into an overall 9x9 matrix; the remaining 3 rows/columns are filled in with respective values from the 9x9 global matrix.
The positive definite matrix is failing on that 9x9 correlation matrix for sample 3.

Below is the test code I am using.

functions {

matrix convex_combine_cholesky(matrix global_chol_cor, matrix local_chol_cor, real alpha){
int dim = rows(global_chol_cor);
matrix[dim,dim] global_cor = multiply_lower_tri_self_transpose(global_chol_cor);
matrix[dim,dim] local_cor = multiply_lower_tri_self_transpose(local_chol_cor);
matrix[dim,dim] combined_chol_cor = cholesky_decompose((1 - alpha)*global_cor + (alpha)*local_cor);
return(combined_chol_cor);
}

}

data {
int N;
int G;
row_vector bfi[G-1,N];
row_vector bfi_nine[N];
}

parameters {
real<lower=0,upper=1> alpha; //Pooling coefficient; hi alpha, independence; lo alpha, total dependence
cholesky_factor_corr R_global; //Overall corr matrix
cholesky_factor_corr R_diff[G]; // "Difference" matrices of sorts
}

transformed parameters {
corr_matrix R_group[G]; // Construct group-specific correlation matrices
corr_matrix R_group_nine; // The 9x9 matrix in which sample 3's 6x6 is embedded
corr_matrix R_global_cor = multiply_lower_tri_self_transpose(R_global);
for(g in 1:G){
R_group[g] = multiply_lower_tri_self_transpose(convex_combine_cholesky(R_global[1:6,1:6],R_diff[g],alpha));
}
// Fill in sample 3's 9x9 matrix with the 6x6 values + the remaining rows/columns from R_global_cor
for(c in 1:6){
for(r in c:6){
R_group_nine[r,c] = R_group[3,r,c];
R_group_nine[c,r] = R_group_nine[r,c];
}
}
for(r in 7:9){
for(c in 1:r){
R_group_nine[r,c] = R_global_cor[r,c];
R_group_nine[c,r] = R_group_nine[r,c];
}
}

}

model {
R_global ~ lkj_corr(3);
alpha ~ beta(2,2); //Can be tweaked
for(g in 1:2){
R_diff[g] ~ lkj_corr_cholesky(1/alpha); //1/alpha keeps differences large when alpha high; small when alpha low
bfi[g] ~ multi_normal(rep_vector(0.0,6),R_group[g]);
}
R_diff ~ lkj_corr_cholesky(1/alpha);
bfi_nine ~ multi_normal(rep_vector(0.0,9),R_group_nine);
}

Does this work?

data {
int N;
row_vector bfi[4,N];
}

parameters {
real<lower=0,upper=1> alpha; //Pooling coefficient
cholesky_corr_matrix R_mu_L;
cholesky_corr_matrix R_diff_L;
}

transformed parameters {
corr_matrix R_mu = multiply_lower_tri_self_transpose(R_mu_L);
corr_matrix R_group;
for (i in 1:9) R_mu[i,i] = 1; // numerical correction
for(g in 1:4){
R_group[g] = (1 - alpha)*R_mu
+ alpha*multiply_lower_tri_self_transpose(R_diff_L[g]);
}
}

model {
R_mu_L ~ lkj_corr_cholesky(3);
alpha ~ beta(2,2);
for(g in 1:4){
R_diff_L[g] ~ lkj_corr_cholesky(1/alpha);
bfi[g] ~ multi_normal(rep_vector(0.0,9),R_group[g]);
}
}


I just tried something similar, but more specific to my case (where there are unequal numbers of correlations across groups/samples).

I estimated R_diff_L (three samples/groups) as a 9x9, even though 2 of the three groups will only use an inner 6x6. That worked.

So the issue comes when you want to augment a matrix.

You CAN:

1. Estimate 9x9 R_mu
2. Estimate three 9x9 R_diff matrices
3. Convex combine each of three R_diff with 9x9 R_mu
4. For 2 of the 3 samples, use the inner 6x6 matrix in multi_normal

You CANNOT:

1. Estimate 9x9 R_mu
2. Estimate 3 6x6 R_diff matrices
3. Convex combine each of the three R_diff with the inner 6x6 R_mu matrix
4. Use that 6x6 for the first 2 samples
5. Augment the 3rd sample’s 6x6 matrix to a 9x9 by using respective R_mu values; e.g., R_group_nine[7,8] = R_mu[7,8]

The latter produces non-positive definiteness.

I’ll check to see whether the results make sense. Is it a problem that the correlations in rows/columns 7-9 are estimated from R_mu and the three R_group matrices, even though only group 3 makes use of those correlations?

So, this revision does not work?

data {
int N;
row_vector bfi[4,N];
}

parameters {
real<lower=0,upper=1> alpha; //Pooling coefficient
cholesky_corr_matrix R_mu_L;
cholesky_corr_matrix R_diff_L;
cholesky_corr_matrix R_diff_L2;
}

transformed parameters {
corr_matrix R_mu = multiply_lower_tri_self_transpose(R_mu_L);
corr_matrix R_group;
corr_matrix R_group2 = (1 - alpha) * R_mu + alpha *
multiply_lower_tri_self_transpose(R_diff_L2);
for(g in 1:3){
R_group[g] = (1 - alpha)*R_mu[1:6,1:6]
+ alpha*multiply_lower_tri_self_transpose(R_diff_L[g]);
}

}


That worked.

Ok, so adding to the can list; for future self:

CAN +=

1. Estimate 9x9 R_mu
2. Estimate 1 9x9 R_diff matrices; and 2 6x6 R_diff matrices
3. Convex combine 9x9 with R_mu
4. Convex combine 6x6 matrices with R_mu[1:6,1:6]

If I did estimate all 9x9 R_diff matrices, even though for two, only the inner 6x6 is used, does that affect the estimates?
I know it’s inefficient, because it’s needless sampling.
However, it would be /cleaner/, for post-processing reasons.

With proper priors, it is fine.

1 Like

Thank you for all your input - Seriously, a life saver.

Also for future me (or anyone who wants to do this simply), I wrote two functions that helps compute the per-group correlation matrices (or choleskies).
It takes a global matrix, a local diff matrix, and alpha.
If the local matrix is smaller than the global matrix, then it uses the inner-most subset from the global matrix.
E.g., if you have a 9x9 global matrix, but a 6x6 local matrix, then it will use global[1:6,1:6]; this means the function requires that the local matrix is literally a top-right-corner subset of the global matrix.

	matrix convex_combine_cholesky(matrix global_chol_cor, matrix local_chol_cor, real alpha){
int dim = rows(local_chol_cor);
int global_dim = rows(global_chol_cor);
matrix[global_dim,global_dim] global_cor = multiply_lower_tri_self_transpose(global_chol_cor);
matrix[dim,dim] local_cor = multiply_lower_tri_self_transpose(local_chol_cor);
matrix[dim,dim] global_cor_use;
matrix[dim,dim] combined_chol_cor;

if(dim < rows(global_cor)){
global_cor_use = global_cor[1:dim,1:dim];
} else {
global_cor_use = global_cor;
}
combined_chol_cor = cholesky_decompose((1 - alpha)*global_cor_use + (alpha)*local_cor);
return(combined_chol_cor);
}

matrix convex_combine(matrix global_cor, matrix local_cor, real alpha){
int dim = rows(local_cor);
matrix[dim,dim] global_cor_use;
matrix[dim,dim] combined_cor;
if(dim < rows(global_cor)){
global_cor_use = global_cor[1:dim,1:dim];
} else {
global_cor_use = global_cor;
}
combined_cor = (1 - alpha)*global_cor_use + alpha*local_cor;
return(combined_cor);
}


To use it, you need to declare a global matrix, which should be the same dimension as the largest group matrix; group-specific ‘diff’ matrices; and alpha.
It looks like this:

parameters {
cholesky_factor_corr R_global_L;
cholesky_factor_corr R_diff_L;
real<lower=0,upper=1> alpha;
}
transformed parameters {
cholesky_factor_corr R_group_L;
for(g in 1:4){
R_group_L[g] = convex_combine_cholesky(R_global_L,R_diff_L[g],alpha);
}
}
model {
alpha ~ beta(2,2)
R_global_L ~ lkj_corr_cholesky(3);
for(g in 1:4){
R_diff_L[g] ~ lkj_corr_cholesky(1/alpha);
data[g] ~ multi_normal_cholesky(muVector, R_group_L[g]);
}

}
4 Likes

The problem is that you were using \BoldSymbol or something like that. I switched to \mathbf so it’d render.

Hi there,

I think this pooling coefficient thing is very interesting. I’m wondering if I can adapt it when parameterizing hierarchical prior on the covariance matrix. See the following Stan code:

data {
int<lower=1> J; // number of subjects
int<lower=1> K; // number of classes
}

parameters {
real<lower=0, upper=1> alpha; // pooling coefficient
vector<lower=0> sigma_u_mu;  // class mean random effects
vector<lower=0> sigma_u_diff[K]; // class deviations from mean random effect means
cholesky_factor_corr L_Omega_mu; // Mean Choleskey decomposition of correlation matrix
cholesky_factor_corr L_Omega_diff[K]; // class deviations from mean Choleskey decomposition of correlation matrix
corr_matrix Omega_mu; // class mean correlation matrix
corr_matrix Omega_diff[K]; // class deviations from mean correlation matrix
matrix [2, J] z_u[K]; // uncorrelated (standardized) random effects
matrix [2, 2] L_mu; // class mean decomposition of covariance matrix
matrix [2, 2] L_diff[K]; // class deviations from mean decomposition of covariance matrix
}

transformed parameters {
matrix[2, J] u[K]; // correlated (true) random effects
for (k in 1:K) {
u[k] = ((1-alpha) * diag_pre_multiply(sigma_u_mu,  L_Omega_mu)
+ alpha * diag_pre_multiply(sigma_u_diff[k],  L_Omega_diff[k])) * z_u[k]; // can I do this?
}
}


Specifically, can I adapt the idea of pooling coefficient to the Cholesky decomposition of the covariance matrix, and then use non-centered parameterization to get valid random effects.

Thank you!!!