Most of the models I estimate have parameters with a multivariate normal prior. The correlation matrix in the prior can be estimated via lkj_corr_cholesky. But as the dimension of the correlation matrix dimension increases (mine are often 100-200), the correlations generated from lkj_corr_cholesky also tend to get smaller (even with eta =1), and appear to underestimate correlations in my data.
I coded the Onion method for generating LKJ correlations based on this post. https://discourse.mc-stan.org/t/lkj-corr-lpdf-correlation-matrix-is-not-positive-definite/31995/7. The results I got from using that are nearly identical to lkj_corr_cholesky. But rather than coding the full cholesky via Onion, I tried to estimate a reduced version. This reduced version fixes the off-diagonal elements to 0 for columns above a threshold (bold cells below). For example, we can reduce this 7 dimension corr_cholesky to 3 columns estimated with fixed 0’s in cols 4-6:
| Col1 | Col2 | Col3 | Col4_F | Col5_F | Col6_F | Col7 |
|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0.3 | 0.9539 | 0 | 0 | 0 | 0 | 0 |
| 0.4 | 0.8 | 0.4472 | 0 | 0 | 0 | 0 |
| -0.4 | -0.3 | -0.5 | 0.7071 | 0 | 0 | 0 |
| 0.3 | 0.7 | 0.1 | 0 | 0.6403 | 0 | 0 |
| 0.3 | 0.5 | 0.2 | 0 | 0 | 0.7874 | 0 |
| 0.6 | 0.4 | 0.3 | 0 | 0 | 0 | 0.6245 |
The beta priors are the same for the R2 sumsq elements in rows 4-7 (where typically they change because we have increasing parameters). The resulting correlation matrices span a wider range of values. A correlation matrix of dimension 100, when “reduced” to just 10 columns gives the same distribution of correlations as a 10 dimension lkj_corr_cholesky matrix.
So far this appears to work well. Interestingly, despite fewer parameters, estimation time is not reduced, and in fact slightly longer than the full estimation. I am guessing this is because the beta priors in the reduced version force Stan to explore a wider range of correlations.
I’ve attached my code below. Please criticize if you think this is problematic, or test yourself. If P_red = P, then one gets the full standard Onion = lkj_corr_cholesky.
functions {
array[] vector beta_shapes(int P, int P_red, real eta) {
// sumsq of non-diagonal each row = R2 has beta(shape1, shape2) priors
real alpha = eta + (P_red - 2) / 2.0;
array[2] vector[P - 1] shape;
shape[1][1] = alpha;
shape[2][1] = alpha;
int k = 1;
for (i in 2 : (P - 1)) { // P-1 corresponds to final Pth diagonal
if (k < (P_red-1)) {
k +=1;
alpha = alpha - .5;
}
shape[1][i] = k / 2.0;
shape[2][i] = alpha;
}
return shape;
}
matrix lkj_onion(int P, int P_red, row_vector l, vector R2) {
matrix[P, P] L = rep_matrix(0, P, P); // cholesky_factor corr matrix
{
int start = 1;
int end = 2;
L[1, 1] = 1.0;
// Row 2
L[2, 1] = 2.0 * R2[1] - 1.0;
L[2, 2] = sqrt(1.0 - square(L[2, 1])); // Repeat for Rows 3 to P
int row_n = 2; // Number of entries for the row
for (krow in 3 : P) {
int r_i = krow-1; // R2 index: one less than row
row_vector[row_n] l_row = l[start:end]; // l is long vector, get the subset for this row
real scale = sqrt(R2[r_i] / dot_self(l_row)); // sum_sq(nondiag) = R2
L[krow, 1:row_n] = l_row * scale;
L[krow, krow] = sqrt(1.0 - R2[r_i]); // diagonal so sum_sq(all) = 1
if (krow <= P_red) row_n +=1; // Increase row_n until we hit P_red
start = end + 1;
end = start + row_n - 1;
}
}
return L;
}
}
data {
int<lower=0> P; // dim of corr mat
int<lower=0, upper = P> P_red; // non-diag cols 1:P_red are estimated, >P_red fixed at 0
real<lower=0> eta; // lkj param
}
transformed data {
array[2] vector[P - 1] diag_beta_shapes = beta_shapes(P, P_red, eta);
int n_offdiag = P_red*(P - P_red) + choose(P_red, 2); // off-diagonal parameters given reduced
}
parameters {
row_vector[n_offdiag - 1] l; // off diagonal elements (except [2,1]) put into one vector
vector<lower=0, upper=1>[P - 1] R2; // SumSquares of non-diagonal, 1st row has none, estimate 2:P
}
transformed parameters {
cholesky_factor_corr[P] L = lkj_onion(P, P_red, l, R2); // Make lower chol(LKJ)
}
model {
target += std_normal_lpdf(l);
target += beta_lpdf(R2 | diag_beta_shapes[1], diag_beta_shapes[2]);
}
generated quantities {
matrix[P, P] cor_LKJOnion = multiply_lower_tri_self_transpose(L);
}
