Hierarchical prior (for partial pooling) on correlation matrices?

Does this work?

data {
	int N;
	row_vector[9] bfi[4,N];
}

parameters {
	real<lower=0,upper=1> alpha; //Pooling coefficient
	cholesky_corr_matrix[9] R_mu_L;
	cholesky_corr_matrix[9] R_diff_L[4];
}

transformed parameters {
        corr_matrix[9] R_mu = multiply_lower_tri_self_transpose(R_mu_L);
	corr_matrix[9] R_group[4];
        for (i in 1:9) R_mu[i,i] = 1; // numerical correction
	for(g in 1:4){
		R_group[g] = (1 - alpha)*R_mu
                           + alpha*multiply_lower_tri_self_transpose(R_diff_L[g]);
	}
}

model {
	R_mu_L ~ lkj_corr_cholesky(3);
	alpha ~ beta(2,2);
	for(g in 1:4){
		R_diff_L[g] ~ lkj_corr_cholesky(1/alpha);
		bfi[g] ~ multi_normal(rep_vector(0.0,9),R_group[g]);
	}
}

I just tried something similar, but more specific to my case (where there are unequal numbers of correlations across groups/samples).

I estimated R_diff_L[3] (three samples/groups) as a 9x9, even though 2 of the three groups will only use an inner 6x6. That worked.

So the issue comes when you want to augment a matrix.

You CAN:

  1. Estimate 9x9 R_mu
  2. Estimate three 9x9 R_diff matrices
  3. Convex combine each of three R_diff with 9x9 R_mu
  4. For 2 of the 3 samples, use the inner 6x6 matrix in multi_normal

You CANNOT:

  1. Estimate 9x9 R_mu
  2. Estimate 3 6x6 R_diff matrices
  3. Convex combine each of the three R_diff with the inner 6x6 R_mu matrix
  4. Use that 6x6 for the first 2 samples
  5. Augment the 3rd sample’s 6x6 matrix to a 9x9 by using respective R_mu values; e.g., R_group_nine[7,8] = R_mu[7,8]

The latter produces non-positive definiteness.

I’ll check to see whether the results make sense. Is it a problem that the correlations in rows/columns 7-9 are estimated from R_mu and the three R_group matrices, even though only group 3 makes use of those correlations?

So, this revision does not work?

data {
	int N;
	row_vector[9] bfi[4,N];
}

parameters {
	real<lower=0,upper=1> alpha; //Pooling coefficient
	cholesky_corr_matrix[9] R_mu_L;
	cholesky_corr_matrix[6] R_diff_L[3];
        cholesky_corr_matrix[9] R_diff_L2;
}

transformed parameters {
        corr_matrix[9] R_mu = multiply_lower_tri_self_transpose(R_mu_L);
	corr_matrix[6] R_group[3];
        corr_matrix[9] R_group2 = (1 - alpha) * R_mu + alpha * 
                                   multiply_lower_tri_self_transpose(R_diff_L2);
	for(g in 1:3){
		R_group[g] = (1 - alpha)*R_mu[1:6,1:6]
                           + alpha*multiply_lower_tri_self_transpose(R_diff_L[g]);
	}
       
}

That worked.

Ok, so adding to the can list; for future self:

CAN +=

  1. Estimate 9x9 R_mu
  2. Estimate 1 9x9 R_diff matrices; and 2 6x6 R_diff matrices
  3. Convex combine 9x9 with R_mu
  4. Convex combine 6x6 matrices with R_mu[1:6,1:6]

If I did estimate all 9x9 R_diff matrices, even though for two, only the inner 6x6 is used, does that affect the estimates?
I know it’s inefficient, because it’s needless sampling.
However, it would be /cleaner/, for post-processing reasons.

With proper priors, it is fine.

1 Like

Thank you for all your input - Seriously, a life saver.

Also for future me (or anyone who wants to do this simply), I wrote two functions that helps compute the per-group correlation matrices (or choleskies).
It takes a global matrix, a local diff matrix, and alpha.
If the local matrix is smaller than the global matrix, then it uses the inner-most subset from the global matrix.
E.g., if you have a 9x9 global matrix, but a 6x6 local matrix, then it will use global[1:6,1:6]; this means the function requires that the local matrix is literally a top-right-corner subset of the global matrix.

	matrix convex_combine_cholesky(matrix global_chol_cor, matrix local_chol_cor, real alpha){
		int dim = rows(local_chol_cor);
		int global_dim = rows(global_chol_cor);
		matrix[global_dim,global_dim] global_cor = multiply_lower_tri_self_transpose(global_chol_cor);
		matrix[dim,dim] local_cor = multiply_lower_tri_self_transpose(local_chol_cor);
		matrix[dim,dim] global_cor_use;
		matrix[dim,dim] combined_chol_cor;
		
		if(dim < rows(global_cor)){
			global_cor_use = global_cor[1:dim,1:dim];
		} else {
			global_cor_use = global_cor;
		}
		combined_chol_cor = cholesky_decompose((1 - alpha)*global_cor_use + (alpha)*local_cor);
		return(combined_chol_cor);
	}
	
	matrix convex_combine(matrix global_cor, matrix local_cor, real alpha){
		int dim = rows(local_cor);
		matrix[dim,dim] global_cor_use;
		matrix[dim,dim] combined_cor;
		if(dim < rows(global_cor)){
			global_cor_use = global_cor[1:dim,1:dim];
		} else {
			global_cor_use = global_cor;
		}
		combined_cor = (1 - alpha)*global_cor_use + alpha*local_cor;
		return(combined_cor);
	}

To use it, you need to declare a global matrix, which should be the same dimension as the largest group matrix; group-specific ‘diff’ matrices; and alpha.
It looks like this:

parameters {
    cholesky_factor_corr[9] R_global_L;
    cholesky_factor_corr[9] R_diff_L[4];
    real<lower=0,upper=1> alpha;
}
transformed parameters {
    cholesky_factor_corr[9] R_group_L[4];
    for(g in 1:4){
        R_group_L[g] = convex_combine_cholesky(R_global_L,R_diff_L[g],alpha);
    }
}
model {
    alpha ~ beta(2,2) 
    R_global_L ~ lkj_corr_cholesky(3);
    for(g in 1:4){
        R_diff_L[g] ~ lkj_corr_cholesky(1/alpha);
        data[g] ~ multi_normal_cholesky(muVector, R_group_L[g]);
    }

}
4 Likes

The problem is that you were using \BoldSymbol or something like that. I switched to \mathbf so it’d render.

Hi there,

I think this pooling coefficient thing is very interesting. I’m wondering if I can adapt it when parameterizing hierarchical prior on the covariance matrix. See the following Stan code:

data {
  int<lower=1> J; // number of subjects
  int<lower=1> K; // number of classes
}

parameters {
  real<lower=0, upper=1> alpha; // pooling coefficient
  vector<lower=0>[2] sigma_u_mu;  // class mean random effects
  vector<lower=0>[2] sigma_u_diff[K]; // class deviations from mean random effect means
  cholesky_factor_corr[2] L_Omega_mu; // Mean Choleskey decomposition of correlation matrix
  cholesky_factor_corr[2] L_Omega_diff[K]; // class deviations from mean Choleskey decomposition of correlation matrix
  corr_matrix[2] Omega_mu; // class mean correlation matrix 
  corr_matrix[2] Omega_diff[K]; // class deviations from mean correlation matrix 
  matrix [2, J] z_u[K]; // uncorrelated (standardized) random effects
  matrix [2, 2] L_mu; // class mean decomposition of covariance matrix
  matrix [2, 2] L_diff[K]; // class deviations from mean decomposition of covariance matrix
}

transformed parameters {
  matrix[2, J] u[K]; // correlated (true) random effects
  for (k in 1:K) {
    u[k] = ((1-alpha) * diag_pre_multiply(sigma_u_mu,  L_Omega_mu) 
    + alpha * diag_pre_multiply(sigma_u_diff[k],  L_Omega_diff[k])) * z_u[k]; // can I do this?
  }
}

Specifically, can I adapt the idea of pooling coefficient to the Cholesky decomposition of the covariance matrix, and then use non-centered parameterization to get valid random effects.

Thank you!!!