Hierarchical prior (for partial pooling) on correlation matrices?

Oh wait; Is there a way to force positive definiteness?
Say I had a cholesky factor of a correlation matrix.
At [1,1], I simply put 1, 0s on the upper diagonal.

Hierarchically pool lower triangle. Constrain the diagonal to be sqrt(1 - sum(rowValues)^2).
Would that then produce a valid correlation matrix?

Not if sum(rowValues^2) > 1

1 Like

Right; I see (Just tried it; estimated fine, but had divergences - The parameters were recovered well though).

I can’t see exactly what you wrote above, because it says “Undefined control sequence \boldsymbol”

If you could remove the \boldsymbol, I’d be grateful, so I can better understand what you wrote.

I take it this is a sort of non-centered parameterization for cor matrices? Estimate a common matrix, estimate ‘difference’ matrices, and combine them to produce a group-specific matrix?

Ah I see; is \alpha a parameter, or is it fixed? I guess it’s basically just the scaling parameter, correct? As \alpha \rightarrow 1, the LKJ distribution approaches uniformity, and less pooling occurs. As \alpha \rightarrow 0, the LKJ prior is sharply peaked around identity, and full pooling basically occurs (Because all the weight is on \Lambda_0 ? Is that right?

And \Lambda_j is essentially a difference matrix? So it’s a bit like the non-centered random effect approach?

You may have to enable Javascript or something to see the \LaTeX, but yes it was just Lambda_j. You would declare real<lower=0,upper=1> alpha; in the parameters block and estimate it, perhaps with a Beta prior.

I’ll give it a shot. Thank you so much for your help. Stan discourse is the ultimate applied stats journal.

1 Like

Thank you @bgoodri
That worked strikingly well on a test case of mine.
It was also quite fast. With a corr_matrix[9] \Lambda_0 and 4 corr_matrix[9] \Lambda_j, it took a minute.
The code used:
BFI is just a list of 4 100x9 subsets from the psych::bfi dataset in R.

Are there any efficiency improvements available here? I assume cholesky correlation matrices do not have the convex combination property; I could still sample as cholesky, but I assume the act of transforming it to a correlation matrix, then perhaps back to a cholesky factor for multi_normal_cholesky would undo the efficiency benefits of using cholesky in the first place.

data {
	int N;
	row_vector[9] bfi[4,N];
}

parameters {
	real<lower=0,upper=1> alpha; //Pooling coefficient
	corr_matrix[9] R_mu;
	corr_matrix[9] R_diff[4];
}

transformed parameters {
	corr_matrix[9] R_group[4];
	for(g in 1:4){
		R_group[g] = (1 - alpha)*R_mu + alpha*R_diff[g];
	}
}

model {
	R_mu ~ lkj_corr(3);
	alpha ~ beta(2,2);
	for(g in 1:4){
		R_diff[g] ~ lkj_corr(1/alpha);
		bfi[g] ~ multi_normal(rep_vector(0.0,9),R_group[g]);
	}
	
}

The convex combination thing has to be done on correlation matrices, but you can get it slightly faster and more numerically stable by declaring things as cholesky_factor_corr and forming the correlation matrices yourself with multiply_lower_tri_self_transpose. That doesn’t help with the multivariate normal part, but it does allow you to do ~ lkj_corr_cholesky(shape) rather than ~lkj_corr(shape) to avoid having to do a Cholesky decomposition in order to evaluate the determinant in the LKJ PDF (when shape is not 1).

One more question:

Again, 2 samples have a 6x6 cor matrix; one sample has 9x9 matrix, with the 6x6 embedded. I want the embedded 6x6 to be pooled along with the other 6x6 matrices toward an overall \Lambda_0. Does the convex combination property hold for subsets of correlation matrices as well?

E.g.,
Should there be a 9x9 \Lambda_0 (global matrix), and R_{j,1:6,1:6} = (1 - \alpha)\Lambda_{0,1:6,1:6} + \alpha \Lambda_j, where \Lambda_j is a 6x6 matrix? Then for sample 3, the last 3 columns/rows should just be whatever is in those cells for \Lambda_0?

Yes it holds for all submatrices of a correlation matrix.

I can’t get the approach to work. It throws positive definite errors for R_group_nine, which is the 9x9 correlation matrix for sample 3.

Essentially, I estimate a 9x9 global correlation matrix (cholesky), and 3 6x6 correlation matrices (cholesky; one for each sample).
I combine them using the function you mentioned, to produce R_group correlation matrices - The 6x6 pooled correlation matrix for each sample.
For samples 1 and 2, these are the correlations used in multi_normal().
For sample 3, I embed those values into an overall 9x9 matrix; the remaining 3 rows/columns are filled in with respective values from the 9x9 global matrix.
The positive definite matrix is failing on that 9x9 correlation matrix for sample 3.

Below is the test code I am using.

functions {
	
	matrix convex_combine_cholesky(matrix global_chol_cor, matrix local_chol_cor, real alpha){
		int dim = rows(global_chol_cor);
		matrix[dim,dim] global_cor = multiply_lower_tri_self_transpose(global_chol_cor);
		matrix[dim,dim] local_cor = multiply_lower_tri_self_transpose(local_chol_cor);
		matrix[dim,dim] combined_chol_cor = cholesky_decompose((1 - alpha)*global_cor + (alpha)*local_cor);
		return(combined_chol_cor);
	}
	
}

data {
	int N;
	int G;
	row_vector[6] bfi[G-1,N];
	row_vector[9] bfi_nine[N];
}

parameters {
	real<lower=0,upper=1> alpha; //Pooling coefficient; hi alpha, independence; lo alpha, total dependence
	cholesky_factor_corr[9] R_global; //Overall corr matrix
	cholesky_factor_corr[6] R_diff[G]; // "Difference" matrices of sorts
}

transformed parameters {
	corr_matrix[6] R_group[G]; // Construct group-specific correlation matrices
	corr_matrix[9] R_group_nine; // The 9x9 matrix in which sample 3's 6x6 is embedded
	corr_matrix[9] R_global_cor = multiply_lower_tri_self_transpose(R_global);
	for(g in 1:G){
		R_group[g] = multiply_lower_tri_self_transpose(convex_combine_cholesky(R_global[1:6,1:6],R_diff[g],alpha));
	}
	// Fill in sample 3's 9x9 matrix with the 6x6 values + the remaining rows/columns from R_global_cor
	for(c in 1:6){
		for(r in c:6){
			R_group_nine[r,c] = R_group[3,r,c];
			R_group_nine[c,r] = R_group_nine[r,c];
		}
	}
	for(r in 7:9){
		for(c in 1:r){
			R_group_nine[r,c] = R_global_cor[r,c];
			R_group_nine[c,r] = R_group_nine[r,c];
		}
	}
	
}

model {
	R_global ~ lkj_corr(3); 
	alpha ~ beta(2,2); //Can be tweaked
	for(g in 1:2){
		R_diff[g] ~ lkj_corr_cholesky(1/alpha); //1/alpha keeps differences large when alpha high; small when alpha low
		bfi[g] ~ multi_normal(rep_vector(0.0,6),R_group[g]);
	}
	R_diff[3] ~ lkj_corr_cholesky(1/alpha);
	bfi_nine ~ multi_normal(rep_vector(0.0,9),R_group_nine);
}

Does this work?

data {
	int N;
	row_vector[9] bfi[4,N];
}

parameters {
	real<lower=0,upper=1> alpha; //Pooling coefficient
	cholesky_corr_matrix[9] R_mu_L;
	cholesky_corr_matrix[9] R_diff_L[4];
}

transformed parameters {
        corr_matrix[9] R_mu = multiply_lower_tri_self_transpose(R_mu_L);
	corr_matrix[9] R_group[4];
        for (i in 1:9) R_mu[i,i] = 1; // numerical correction
	for(g in 1:4){
		R_group[g] = (1 - alpha)*R_mu
                           + alpha*multiply_lower_tri_self_transpose(R_diff_L[g]);
	}
}

model {
	R_mu_L ~ lkj_corr_cholesky(3);
	alpha ~ beta(2,2);
	for(g in 1:4){
		R_diff_L[g] ~ lkj_corr_cholesky(1/alpha);
		bfi[g] ~ multi_normal(rep_vector(0.0,9),R_group[g]);
	}
}

I just tried something similar, but more specific to my case (where there are unequal numbers of correlations across groups/samples).

I estimated R_diff_L[3] (three samples/groups) as a 9x9, even though 2 of the three groups will only use an inner 6x6. That worked.

So the issue comes when you want to augment a matrix.

You CAN:

  1. Estimate 9x9 R_mu
  2. Estimate three 9x9 R_diff matrices
  3. Convex combine each of three R_diff with 9x9 R_mu
  4. For 2 of the 3 samples, use the inner 6x6 matrix in multi_normal

You CANNOT:

  1. Estimate 9x9 R_mu
  2. Estimate 3 6x6 R_diff matrices
  3. Convex combine each of the three R_diff with the inner 6x6 R_mu matrix
  4. Use that 6x6 for the first 2 samples
  5. Augment the 3rd sample’s 6x6 matrix to a 9x9 by using respective R_mu values; e.g., R_group_nine[7,8] = R_mu[7,8]

The latter produces non-positive definiteness.

I’ll check to see whether the results make sense. Is it a problem that the correlations in rows/columns 7-9 are estimated from R_mu and the three R_group matrices, even though only group 3 makes use of those correlations?

So, this revision does not work?

data {
	int N;
	row_vector[9] bfi[4,N];
}

parameters {
	real<lower=0,upper=1> alpha; //Pooling coefficient
	cholesky_corr_matrix[9] R_mu_L;
	cholesky_corr_matrix[6] R_diff_L[3];
        cholesky_corr_matrix[9] R_diff_L2;
}

transformed parameters {
        corr_matrix[9] R_mu = multiply_lower_tri_self_transpose(R_mu_L);
	corr_matrix[6] R_group[3];
        corr_matrix[9] R_group2 = (1 - alpha) * R_mu + alpha * 
                                   multiply_lower_tri_self_transpose(R_diff_L2);
	for(g in 1:3){
		R_group[g] = (1 - alpha)*R_mu[1:6,1:6]
                           + alpha*multiply_lower_tri_self_transpose(R_diff_L[g]);
	}
       
}

That worked.

Ok, so adding to the can list; for future self:

CAN +=

  1. Estimate 9x9 R_mu
  2. Estimate 1 9x9 R_diff matrices; and 2 6x6 R_diff matrices
  3. Convex combine 9x9 with R_mu
  4. Convex combine 6x6 matrices with R_mu[1:6,1:6]

If I did estimate all 9x9 R_diff matrices, even though for two, only the inner 6x6 is used, does that affect the estimates?
I know it’s inefficient, because it’s needless sampling.
However, it would be /cleaner/, for post-processing reasons.

With proper priors, it is fine.

1 Like

Thank you for all your input - Seriously, a life saver.

Also for future me (or anyone who wants to do this simply), I wrote two functions that helps compute the per-group correlation matrices (or choleskies).
It takes a global matrix, a local diff matrix, and alpha.
If the local matrix is smaller than the global matrix, then it uses the inner-most subset from the global matrix.
E.g., if you have a 9x9 global matrix, but a 6x6 local matrix, then it will use global[1:6,1:6]; this means the function requires that the local matrix is literally a top-right-corner subset of the global matrix.

	matrix convex_combine_cholesky(matrix global_chol_cor, matrix local_chol_cor, real alpha){
		int dim = rows(local_chol_cor);
		int global_dim = rows(global_chol_cor);
		matrix[global_dim,global_dim] global_cor = multiply_lower_tri_self_transpose(global_chol_cor);
		matrix[dim,dim] local_cor = multiply_lower_tri_self_transpose(local_chol_cor);
		matrix[dim,dim] global_cor_use;
		matrix[dim,dim] combined_chol_cor;
		
		if(dim < rows(global_cor)){
			global_cor_use = global_cor[1:dim,1:dim];
		} else {
			global_cor_use = global_cor;
		}
		combined_chol_cor = cholesky_decompose((1 - alpha)*global_cor_use + (alpha)*local_cor);
		return(combined_chol_cor);
	}
	
	matrix convex_combine(matrix global_cor, matrix local_cor, real alpha){
		int dim = rows(local_cor);
		matrix[dim,dim] global_cor_use;
		matrix[dim,dim] combined_cor;
		if(dim < rows(global_cor)){
			global_cor_use = global_cor[1:dim,1:dim];
		} else {
			global_cor_use = global_cor;
		}
		combined_cor = (1 - alpha)*global_cor_use + alpha*local_cor;
		return(combined_cor);
	}

To use it, you need to declare a global matrix, which should be the same dimension as the largest group matrix; group-specific ‘diff’ matrices; and alpha.
It looks like this:

parameters {
    cholesky_factor_corr[9] R_global_L;
    cholesky_factor_corr[9] R_diff_L[4];
    real<lower=0,upper=1> alpha;
}
transformed parameters {
    cholesky_factor_corr[9] R_group_L[4];
    for(g in 1:4){
        R_group_L[g] = convex_combine_cholesky(R_global_L,R_diff_L[g],alpha);
    }
}
model {
    alpha ~ beta(2,2) 
    R_global_L ~ lkj_corr_cholesky(3);
    for(g in 1:4){
        R_diff_L[g] ~ lkj_corr_cholesky(1/alpha);
        data[g] ~ multi_normal_cholesky(muVector, R_group_L[g]);
    }

}
4 Likes

The problem is that you were using \BoldSymbol or something like that. I switched to \mathbf so it’d render.

Hi there,

I think this pooling coefficient thing is very interesting. I’m wondering if I can adapt it when parameterizing hierarchical prior on the covariance matrix. See the following Stan code:

data {
  int<lower=1> J; // number of subjects
  int<lower=1> K; // number of classes
}

parameters {
  real<lower=0, upper=1> alpha; // pooling coefficient
  vector<lower=0>[2] sigma_u_mu;  // class mean random effects
  vector<lower=0>[2] sigma_u_diff[K]; // class deviations from mean random effect means
  cholesky_factor_corr[2] L_Omega_mu; // Mean Choleskey decomposition of correlation matrix
  cholesky_factor_corr[2] L_Omega_diff[K]; // class deviations from mean Choleskey decomposition of correlation matrix
  corr_matrix[2] Omega_mu; // class mean correlation matrix 
  corr_matrix[2] Omega_diff[K]; // class deviations from mean correlation matrix 
  matrix [2, J] z_u[K]; // uncorrelated (standardized) random effects
  matrix [2, 2] L_mu; // class mean decomposition of covariance matrix
  matrix [2, 2] L_diff[K]; // class deviations from mean decomposition of covariance matrix
}

transformed parameters {
  matrix[2, J] u[K]; // correlated (true) random effects
  for (k in 1:K) {
    u[k] = ((1-alpha) * diag_pre_multiply(sigma_u_mu,  L_Omega_mu) 
    + alpha * diag_pre_multiply(sigma_u_diff[k],  L_Omega_diff[k])) * z_u[k]; // can I do this?
  }
}

Specifically, can I adapt the idea of pooling coefficient to the Cholesky decomposition of the covariance matrix, and then use non-centered parameterization to get valid random effects.

Thank you!!!