Constraints on LKJ prior

sonicking · July 13, 2020, 1:25am

Hi, is it possible to put constraints on LKJ prior for the correlation matrix?

I have a model with multiple varying intercepts and slopes, and I want to assign them a multivariate prior. However, I would like to force some elements of the the correlation matrix to be positive, negative, or zero. Is there a way to do this while still using the LKJ prior for the correlation matrix and following the cholesky_factor_corr trick? Thank you.

bgoodri · July 13, 2020, 3:54am

Not really. Or more specifically, you could so something like that but then it wouldn’t be an LKJ distribution. See

gist.github.com

https://gist.github.com/rmcelreath/68fb40bf7a3e73bf2623e10a5973216d#gistcomment-3260821

deconfounding_proxies.R

# proxy confounder problem
# cf https://twitter.com/y2silence/status/1251151157264674816

# note that I rename X1,X2 to A,B and A to X
# The DAG:
# X -> Y
# X <- U -> Y
# A <- U -> B

# Can we use A and B to deconfound X->Y?

This file has been truncated. show original

spinkney · July 13, 2020, 1:21pm

Ben, thanks for this info! I’m hoping you could expand on what you wrote in the linked gist. For example, you write, “All this is easier if you can reorder the variables so that the fixed correlations are toward the left and the top of the correlation matrix.” Would you be able to put together a full working example using the 4 x 4 covariance matrix in the gist (and possibly a counter-example when you don’t reorder, what the difficulty is)?

Can you elaborate on when fixing values that no value of \Omega between [-1,1] will satisfy the equation? Do you mean the equation that you wrote in the gist?

I’ve encountered wanting to do something like this and it seems others are too. I could see this being a case study or going in the manual.

sonicking · July 13, 2020, 1:46pm

Wow. Thanks. The math there is a bit much for me to understand. But is it fair to say that it is better not to use the LKJ prior and the cholesky_factor_corr trick in this situation? I think I can construct the correlation matrix by block or by element and then sample it that way.

bgoodri · July 15, 2020, 2:57am

The first column (or row, since it is symmetric) of a correlation matrix is unrestricted under the LKJ transformation (as distinct from the LKJ probability distribution). Thus, so is its Cholesky factor. Here is an example with the Cholesky factor of a 3x3 correlation matrix:
L = \begin{bmatrix} 1 & 0 & 0 \\ a & \sqrt{1 - a^2} & 0\\ b & c & \sqrt{1 - b^2 - c^2} \end{bmatrix}

So, if \boldsymbol{\Sigma} = \mathbf{L}\mathbf{L}^\top, then \Sigma_{ij} is the dot product of the i-th and j-th rows of \mathbf{L}. If you want to restrict either \Sigma_{21} or \Sigma_{31} to be zero, that is easy: Just impose a = 0 or b = 0 respectively. However, technically that restriction means it is not LKJ prior any more, so you should not do L ~ lkj_cor_cholesky(eta);. If you wanted to impose the restriction that \Sigma_{32} = ba + c\sqrt{1 - a^2} + 0\sqrt{1 - b^2 - c^2} = 0, that is a bit more complicated. If you already have a and b, then c = -\frac{ba}{\sqrt{1 - a^2}}, but then L_{33} = \sqrt{1 - b^2 - \frac{b^2a^2}{1 - a^2}} might not be real, depending on what b and a are.

nspencer · October 16, 2020, 4:11pm

Do L_{22} and L_{33} have to be the positive square roots? Or can they be the negative ones? I think this might make a difference.

bgoodri · October 16, 2020, 4:19pm

All of the diagonal elements are defined to be non-negative.

nspencer · October 16, 2020, 4:55pm

Ah yes. Of course they have to be positive because it is a Cholesky decomposition. Thanks.

tarheel · May 11, 2024, 12:37pm

I apologize for bringing up an old conversation (not sure if the etiquette is to start a new post). I am working on a similar problem.

I’m considering @bgoodri’s claim:

While I agree the underlying distribution for the non-fixed parameters would no longer be LKJ, suppose I did the following:

data {
  int<lower=0> K;
  int<lower=0> K1;
  real<lower=-1,upper=1> rho;
}
transformed data {
  int K2 = K - K1;
  matrix[K1,K1] Rho11 = diag_matrix(rep_vector(1 - rho, K1)) + rep_matrix(rho, K1, K1);
  matrix[K1,K1] L11 = cholesky_decompose(Rho11);
}
parameters {
  cholesky_factor_corr[K] Ltilde;
}
transformed parameters {
  matrix[K,K] L = Ltilde;
  L[1:K1,1:K1]  = L11;
}
model {
  Ltilde[1:K1,1:K1] ~ lkj_corr_cholesky(1.0);  // just to put a distribution on params I don't care about
  L ~ lkj_corr_cholesky(1.0);
}
generated quantities {
  matrix[K,K] Cor = multiply_lower_tri_self_transpose(L);
}

If I did this, wouldn’t I effectively be sampling from the conditional distribution induced by the LKJ distribution for the parameters that are not fixed? Presumably, the answer is yes since a conditional distribution is proportional to the joint distribution.

So while it is true that the random parameters above do not follow a LKJ distribution, it seems to me what would be desired in such a situation is to specify a prior for the random parameters conditional on the fixed / known parameters.

Please let me know if there is a flaw in my logic.

spinkney · May 14, 2024, 1:50am

This allows the bounds to follow an LKJ prior with restrictions

I’ll write more about it soon but here’s the x thread where I announce it https://twitter.com/thatpinkney/status/1790196646837436696?t=ATOEoF6s8RmS_Vz1BhTKFg&s=19

tarheel · May 14, 2024, 12:36pm

Really interesting work, @spinkney! I suggest you add a section about known correlations (I saw you mentioned it in the appendix, but it would nice to see its development as a special case).

So effectively, you’ve come up with an approach to elicit a prior from a truncated LKJ distribution, i.e., for an n \times n correlation matrix \mathbf{R} with R_{ij} denoting the (i,j) entry,

f_{\text{LKJ}} ( \mathbf{R} | \{ a_{ij} \le R_{ij} \le b_{ij}, i = 1, \ldots, n, j > i \}).

This is really useful!

spinkney · May 14, 2024, 1:00pm

Yea, the issue with known values that aren’t 0 is that if the known values happen after the first column the stick length needs to be adjusted to account for it, but we can’t adjust the stick length without knowing the values that occur before. I have a solution I’m working on that adjusts the entire row but I’m still working through it. Right now putting a large ETA on the LKJ will help when known values are in as the stick length is less likely to be filled before the known value. Of course with 0s this isn’t a problem.

edm · May 15, 2024, 6:48pm

One question, based on my first read through the document: is the Jacobian calculation intentionally left out of code block 1? The Jacobian appears in algorithm 1 but not code block 1.

spinkney · May 15, 2024, 7:11pm

It’s identical except I left out the division by Ljj in the lower/upper bound mapping and I apply it after. Since I’m applying y/x to a parameter y, the derivative is 1/x. Since x is always positive the log abs jac is -log(x).

I found that the lower/upper mapping is a bit finicky and doing this outside was more stable. I have some additional stuff that is more numerically stable I’ll release soon, because the parameters get really close to the bounds and cause divergences when using an LKJ eta that allows large correlations.

Topic		Replies	Views
Lkj(1) vs no prior on correlation matrix? Modeling	4	1004	January 22, 2019
Hierarchical LKJ priors General techniques	0	272	May 22, 2023
LKJ prior distribution Modeling fitting-issues	44	6172	June 25, 2019
Can I set the LKJ prior on correlation in brms to be nonnegative? Modeling brms	4	2403	June 27, 2021
Whether LKJ prior is proper General specification , math	1	452	November 11, 2020

Constraints on LKJ prior

Related topics