Hierarchical prior (for partial pooling) on correlation matrices?

bgoodri · July 16, 2018, 3:13am

Does this work?

data {
	int N;
	row_vector[9] bfi[4,N];
}

parameters {
	real<lower=0,upper=1> alpha; //Pooling coefficient
	cholesky_corr_matrix[9] R_mu_L;
	cholesky_corr_matrix[9] R_diff_L[4];
}

transformed parameters {
        corr_matrix[9] R_mu = multiply_lower_tri_self_transpose(R_mu_L);
	corr_matrix[9] R_group[4];
        for (i in 1:9) R_mu[i,i] = 1; // numerical correction
	for(g in 1:4){
		R_group[g] = (1 - alpha)*R_mu
                           + alpha*multiply_lower_tri_self_transpose(R_diff_L[g]);
	}
}

model {
	R_mu_L ~ lkj_corr_cholesky(3);
	alpha ~ beta(2,2);
	for(g in 1:4){
		R_diff_L[g] ~ lkj_corr_cholesky(1/alpha);
		bfi[g] ~ multi_normal(rep_vector(0.0,9),R_group[g]);
	}
}

Stephen_Martin · July 16, 2018, 3:23am

I just tried something similar, but more specific to my case (where there are unequal numbers of correlations across groups/samples).

I estimated R_diff_L[3] (three samples/groups) as a 9x9, even though 2 of the three groups will only use an inner 6x6. That worked.

So the issue comes when you want to augment a matrix.

You CAN:

Estimate 9x9 R_mu
Estimate three 9x9 R_diff matrices
Convex combine each of three R_diff with 9x9 R_mu
For 2 of the 3 samples, use the inner 6x6 matrix in multi_normal

You CANNOT:

Estimate 9x9 R_mu
Estimate 3 6x6 R_diff matrices
Convex combine each of the three R_diff with the inner 6x6 R_mu matrix
Use that 6x6 for the first 2 samples
Augment the 3rd sample’s 6x6 matrix to a 9x9 by using respective R_mu values; e.g., R_group_nine[7,8] = R_mu[7,8]

The latter produces non-positive definiteness.

I’ll check to see whether the results make sense. Is it a problem that the correlations in rows/columns 7-9 are estimated from R_mu and the three R_group matrices, even though only group 3 makes use of those correlations?

bgoodri · July 16, 2018, 3:49am

So, this revision does not work?

data {
	int N;
	row_vector[9] bfi[4,N];
}

parameters {
	real<lower=0,upper=1> alpha; //Pooling coefficient
	cholesky_corr_matrix[9] R_mu_L;
	cholesky_corr_matrix[6] R_diff_L[3];
        cholesky_corr_matrix[9] R_diff_L2;
}

transformed parameters {
        corr_matrix[9] R_mu = multiply_lower_tri_self_transpose(R_mu_L);
	corr_matrix[6] R_group[3];
        corr_matrix[9] R_group2 = (1 - alpha) * R_mu + alpha * 
                                   multiply_lower_tri_self_transpose(R_diff_L2);
	for(g in 1:3){
		R_group[g] = (1 - alpha)*R_mu[1:6,1:6]
                           + alpha*multiply_lower_tri_self_transpose(R_diff_L[g]);
	}
       
}

Stephen_Martin · July 16, 2018, 4:09am

That worked.

Ok, so adding to the can list; for future self:

CAN +=

Estimate 9x9 R_mu
Estimate 1 9x9 R_diff matrices; and 2 6x6 R_diff matrices
Convex combine 9x9 with R_mu
Convex combine 6x6 matrices with R_mu[1:6,1:6]

Stephen_Martin · July 16, 2018, 4:15am

If I did estimate all 9x9 R_diff matrices, even though for two, only the inner 6x6 is used, does that affect the estimates?
I know it’s inefficient, because it’s needless sampling.
However, it would be /cleaner/, for post-processing reasons.

bgoodri · July 16, 2018, 4:58am

With proper priors, it is fine.

Stephen_Martin · July 16, 2018, 5:08am

Thank you for all your input - Seriously, a life saver.

Also for future me (or anyone who wants to do this simply), I wrote two functions that helps compute the per-group correlation matrices (or choleskies).
It takes a global matrix, a local diff matrix, and alpha.
If the local matrix is smaller than the global matrix, then it uses the inner-most subset from the global matrix.
E.g., if you have a 9x9 global matrix, but a 6x6 local matrix, then it will use global[1:6,1:6]; this means the function requires that the local matrix is literally a top-right-corner subset of the global matrix.

	matrix convex_combine_cholesky(matrix global_chol_cor, matrix local_chol_cor, real alpha){
		int dim = rows(local_chol_cor);
		int global_dim = rows(global_chol_cor);
		matrix[global_dim,global_dim] global_cor = multiply_lower_tri_self_transpose(global_chol_cor);
		matrix[dim,dim] local_cor = multiply_lower_tri_self_transpose(local_chol_cor);
		matrix[dim,dim] global_cor_use;
		matrix[dim,dim] combined_chol_cor;
		
		if(dim < rows(global_cor)){
			global_cor_use = global_cor[1:dim,1:dim];
		} else {
			global_cor_use = global_cor;
		}
		combined_chol_cor = cholesky_decompose((1 - alpha)*global_cor_use + (alpha)*local_cor);
		return(combined_chol_cor);
	}
	
	matrix convex_combine(matrix global_cor, matrix local_cor, real alpha){
		int dim = rows(local_cor);
		matrix[dim,dim] global_cor_use;
		matrix[dim,dim] combined_cor;
		if(dim < rows(global_cor)){
			global_cor_use = global_cor[1:dim,1:dim];
		} else {
			global_cor_use = global_cor;
		}
		combined_cor = (1 - alpha)*global_cor_use + alpha*local_cor;
		return(combined_cor);
	}

To use it, you need to declare a global matrix, which should be the same dimension as the largest group matrix; group-specific ‘diff’ matrices; and alpha.
It looks like this:

parameters {
    cholesky_factor_corr[9] R_global_L;
    cholesky_factor_corr[9] R_diff_L[4];
    real<lower=0,upper=1> alpha;
}
transformed parameters {
    cholesky_factor_corr[9] R_group_L[4];
    for(g in 1:4){
        R_group_L[g] = convex_combine_cholesky(R_global_L,R_diff_L[g],alpha);
    }
}
model {
    alpha ~ beta(2,2) 
    R_global_L ~ lkj_corr_cholesky(3);
    for(g in 1:4){
        R_diff_L[g] ~ lkj_corr_cholesky(1/alpha);
        data[g] ~ multi_normal_cholesky(muVector, R_group_L[g]);
    }

}

Bob_Carpenter · July 19, 2018, 1:17am

The problem is that you were using \BoldSymbol or something like that. I switched to \mathbf so it’d render.

Doria · March 8, 2022, 4:53pm

Hi there,

I think this pooling coefficient thing is very interesting. I’m wondering if I can adapt it when parameterizing hierarchical prior on the covariance matrix. See the following Stan code:

data {
  int<lower=1> J; // number of subjects
  int<lower=1> K; // number of classes
}

parameters {
  real<lower=0, upper=1> alpha; // pooling coefficient
  vector<lower=0>[2] sigma_u_mu;  // class mean random effects
  vector<lower=0>[2] sigma_u_diff[K]; // class deviations from mean random effect means
  cholesky_factor_corr[2] L_Omega_mu; // Mean Choleskey decomposition of correlation matrix
  cholesky_factor_corr[2] L_Omega_diff[K]; // class deviations from mean Choleskey decomposition of correlation matrix
  corr_matrix[2] Omega_mu; // class mean correlation matrix 
  corr_matrix[2] Omega_diff[K]; // class deviations from mean correlation matrix 
  matrix [2, J] z_u[K]; // uncorrelated (standardized) random effects
  matrix [2, 2] L_mu; // class mean decomposition of covariance matrix
  matrix [2, 2] L_diff[K]; // class deviations from mean decomposition of covariance matrix
}

transformed parameters {
  matrix[2, J] u[K]; // correlated (true) random effects
  for (k in 1:K) {
    u[k] = ((1-alpha) * diag_pre_multiply(sigma_u_mu,  L_Omega_mu) 
    + alpha * diag_pre_multiply(sigma_u_diff[k],  L_Omega_diff[k])) * z_u[k]; // can I do this?
  }
}

Specifically, can I adapt the idea of pooling coefficient to the Cholesky decomposition of the covariance matrix, and then use non-centered parameterization to get valid random effects.

Thank you!!!

Topic		Replies	Views
Hierarchical Correlation? Modeling	42	4392	April 6, 2021
Underestimating correlation coefficients with LKJ prior Modeling	24	6677	September 15, 2020
"Pairwise" alternative to multivariate normal isn't behaving for the hierarchical case; help? Modeling techniques , specification	13	897	June 10, 2021
LKJ prior distribution Modeling fitting-issues	44	6229	June 25, 2019
Sparse sampling for a Cholesky correlation matrix (or modelling correlations among some, but not all parameters) Modeling	15	1275	February 1, 2019

Hierarchical prior (for partial pooling) on correlation matrices?

Related topics