Underestimating correlation coefficients with LKJ prior

That is what is supposed to happen under an LKJ prior. It is easier to think about the case with a shape parameter of 1. When dealing with a 2x2 matrix, that implies the lone correlation has a uniform distribution between -1 and 1. Thus, if the data are consistent with a correlation of 0.96 or something really high, then the posterior distribution will be very skewed but mostly close to 1.

In general, when there are K variables, then the marginal distribution of a single correlation is Beta on the (-1,1) interval with both shape parameters equal to K / 2. Thus if K > 2, this is going to put zero prior density on the points -1 and 1 and have a mean / median / mode at zero. This shrinkage toward the identity matrix gets stronger as K gets larger.

So, if you think that there will be really large correlations among several variables, then the LKJ distribution is probably not the right choice. But we do not have anything better in general, although squared distance correlation functions and the like might work better in some cases.

1 Like