Constraints on LKJ prior

Hi, is it possible to put constraints on LKJ prior for the correlation matrix?

I have a model with multiple varying intercepts and slopes, and I want to assign them a multivariate prior. However, I would like to force some elements of the the correlation matrix to be positive, negative, or zero. Is there a way to do this while still using the LKJ prior for the correlation matrix and following the cholesky_factor_corr trick? Thank you.

Not really. Or more specifically, you could so something like that but then it wouldn’t be an LKJ distribution. See

Ben, thanks for this info! I’m hoping you could expand on what you wrote in the linked gist. For example, you write, “All this is easier if you can reorder the variables so that the fixed correlations are toward the left and the top of the correlation matrix.” Would you be able to put together a full working example using the 4 x 4 covariance matrix in the gist (and possibly a counter-example when you don’t reorder, what the difficulty is)?

Can you elaborate on when fixing values that no value of \Omega between [-1,1] will satisfy the equation? Do you mean the equation that you wrote in the gist?

I’ve encountered wanting to do something like this and it seems others are too. I could see this being a case study or going in the manual.

Wow. Thanks. The math there is a bit much for me to understand. But is it fair to say that it is better not to use the LKJ prior and the cholesky_factor_corr trick in this situation? I think I can construct the correlation matrix by block or by element and then sample it that way.

The first column (or row, since it is symmetric) of a correlation matrix is unrestricted under the LKJ transformation (as distinct from the LKJ probability distribution). Thus, so is its Cholesky factor. Here is an example with the Cholesky factor of a 3x3 correlation matrix:
L = \begin{bmatrix} 1 & 0 & 0 \\ a & \sqrt{1 - a^2} & 0\\ b & c & \sqrt{1 - b^2 - c^2} \end{bmatrix}

So, if \boldsymbol{\Sigma} = \mathbf{L}\mathbf{L}^\top, then \Sigma_{ij} is the dot product of the i-th and j-th rows of \mathbf{L}. If you want to restrict either \Sigma_{21} or \Sigma_{31} to be zero, that is easy: Just impose a = 0 or b = 0 respectively. However, technically that restriction means it is not LKJ prior any more, so you should not do L ~ lkj_cor_cholesky(eta);. If you wanted to impose the restriction that \Sigma_{32} = ba + c\sqrt{1 - a^2} + 0\sqrt{1 - b^2 - c^2} = 0, that is a bit more complicated. If you already have a and b, then c = -\frac{ba}{\sqrt{1 - a^2}}, but then L_{33} = \sqrt{1 - b^2 - \frac{b^2a^2}{1 - a^2}} might not be real, depending on what b and a are.

2 Likes

Do L_{22} and L_{33} have to be the positive square roots? Or can they be the negative ones? I think this might make a difference.

All of the diagonal elements are defined to be non-negative.

Ah yes. Of course they have to be positive because it is a Cholesky decomposition. Thanks.

I apologize for bringing up an old conversation (not sure if the etiquette is to start a new post). I am working on a similar problem.

I’m considering @bgoodri’s claim:

While I agree the underlying distribution for the non-fixed parameters would no longer be LKJ, suppose I did the following:

data {
  int<lower=0> K;
  int<lower=0> K1;
  real<lower=-1,upper=1> rho;
}
transformed data {
  int K2 = K - K1;
  matrix[K1,K1] Rho11 = diag_matrix(rep_vector(1 - rho, K1)) + rep_matrix(rho, K1, K1);
  matrix[K1,K1] L11 = cholesky_decompose(Rho11);
}
parameters {
  cholesky_factor_corr[K] Ltilde;
}
transformed parameters {
  matrix[K,K] L = Ltilde;
  L[1:K1,1:K1]  = L11;
}
model {
  Ltilde[1:K1,1:K1] ~ lkj_corr_cholesky(1.0);  // just to put a distribution on params I don't care about
  L ~ lkj_corr_cholesky(1.0);
}
generated quantities {
  matrix[K,K] Cor = multiply_lower_tri_self_transpose(L);
}

If I did this, wouldn’t I effectively be sampling from the conditional distribution induced by the LKJ distribution for the parameters that are not fixed? Presumably, the answer is yes since a conditional distribution is proportional to the joint distribution.

So while it is true that the random parameters above do not follow a LKJ distribution, it seems to me what would be desired in such a situation is to specify a prior for the random parameters conditional on the fixed / known parameters.

Please let me know if there is a flaw in my logic.

1 Like

This allows the bounds to follow an LKJ prior with restrictions

I’ll write more about it soon but here’s the x thread where I announce it https://twitter.com/thatpinkney/status/1790196646837436696?t=ATOEoF6s8RmS_Vz1BhTKFg&s=19

4 Likes

Really interesting work, @spinkney! I suggest you add a section about known correlations (I saw you mentioned it in the appendix, but it would nice to see its development as a special case).

So effectively, you’ve come up with an approach to elicit a prior from a truncated LKJ distribution, i.e., for an n \times n correlation matrix \mathbf{R} with R_{ij} denoting the (i,j) entry,

f_{\text{LKJ}} ( \mathbf{R} | \{ a_{ij} \le R_{ij} \le b_{ij}, i = 1, \ldots, n, j > i \}).

This is really useful!

1 Like

Yea, the issue with known values that aren’t 0 is that if the known values happen after the first column the stick length needs to be adjusted to account for it, but we can’t adjust the stick length without knowing the values that occur before. I have a solution I’m working on that adjusts the entire row but I’m still working through it. Right now putting a large ETA on the LKJ will help when known values are in as the stick length is less likely to be filled before the known value. Of course with 0s this isn’t a problem.

1 Like

One question, based on my first read through the document: is the Jacobian calculation intentionally left out of code block 1? The Jacobian appears in algorithm 1 but not code block 1.

It’s identical except I left out the division by Ljj in the lower/upper bound mapping and I apply it after. Since I’m applying y/x to a parameter y, the derivative is 1/x. Since x is always positive the log abs jac is -log(x).

I found that the lower/upper mapping is a bit finicky and doing this outside was more stable. I have some additional stuff that is more numerically stable I’ll release soon, because the parameters get really close to the bounds and cause divergences when using an LKJ eta that allows large correlations.

3 Likes