Hi,
I have a question about the LKJ normalizing constant.
As far as I understand it (but please correct me if this is wrong), when I use an LKJ(1.0) prior for a 2 by 2 correlation matrix, this implies that a uniform prior from -1 to 1 is placed on the correlation. Hence, naively, I would expect that the normalizing constant by which one needs to divide the density is equal to 2. On the log scale, this would imply substracting by log(2). However, when I sample in Stan from an LKJ(1.0) prior for a 2 by 2 correlation matrix once without the constant and once including the constant, it looks like instead of substracting by log(2), log(2) is added.
library(rstan)
#-------------------------------------------------------------------------------
# LKJ(1.0) for 2 x 2 correlation matrix
#-------------------------------------------------------------------------------
# without constant
code_m1 <-
"data {
int<lower = 1> d;
}
parameters {
corr_matrix[d] R;
}
model {
R ~ lkj_corr(1.0);
}"
# with constant
code_m2 <-
"data {
int<lower = 1> d;
}
parameters {
corr_matrix[d] R;
}
model {
target += lkj_corr_lpdf(R | 1.0);
}"
# compile models
m1 <- rstan::stan_model(model_code = code_m1)
m2 <- rstan::stan_model(model_code = code_m2)
# obtain samples
d <- 2
s1 <- rstan::sampling(m1, data = list(d = d), control = list(adapt_delta = 0.99), seed = 1)
s2 <- rstan::sampling(m2, data = list(d = d), control = list(adapt_delta = 0.99), seed = 1)
# compare lp__
d1 <- as.data.frame(s1)
d2 <- as.data.frame(s2)
diff_lp21 <- d2$lp__ - d1$lp__
all.equal(diff_lp21, rep(log(2), length(diff_lp21)))
In contrast, when I sample from a uniform prior from -1 to 1 once without the constant and once including the constant, as I would expect, log(2) is substracted.
library(rstan)
#-------------------------------------------------------------------------------
# uniform(-1,1)
#-------------------------------------------------------------------------------
# without constant
code_m3 <-
"parameters {
real<lower = -1.0, upper = 1.0> r;
}
model {
r ~ uniform(-1.0, 1.0);
}"
# with constant
code_m4 <-
"parameters {
real<lower = -1.0, upper = 1.0> r;
}
model {
target += uniform_lpdf(r | -1.0, 1.0);
}"
# compile models
m3 <- rstan::stan_model(model_code = code_m3)
m4 <- rstan::stan_model(model_code = code_m4)
# obtain samples
s3 <- rstan::sampling(m3, control = list(adapt_delta = 0.99), seed = 1)
s4 <- rstan::sampling(m4, control = list(adapt_delta = 0.99), seed = 1)
# compare lp__
d3 <- as.data.frame(s3)
d4 <- as.data.frame(s4)
diff_lp43 <- d4$lp__ - d3$lp__
all.equal(diff_lp43, rep(-log(2), length(diff_lp43)))
I also looked at the Lewandowski, Kurowicka, and Joe (2009) paper and, if I understand it correctly, in Theorem 5 the constant c_d is in the d = 2 case equal to 2. Furthermore, looking at the remaining notation in the paper, it appears that they do not divide the density by c_d but they multiply it by c_d (at least it looks like this in, e.g., Equation 15). Hence, what Stan does appears to be in line with the paper. Nevertheless, it is hard for me to understand why one would want to multiply and not divide by 2 in order to normalize the density in this case (but maybe I am simply misunderstanding something). What also adds to my confusion is that in Equation 16, they give the same expression for c_d which has been presented in Joe (2006, p. 2182), I think, however, in the Joe (2006) paper, the density appears to be divided and not multiplied by c_d.
So my question is: Am I misunderstanding the normalizing constant in the LKJ prior or is there genuinely something off with the _lpdf
version of the LKJ prior?
In case you are wondering why I am interested in this, we need the correct density (i.e., including normalizing constant) for estimating marginal likelihoods (e.g., via bridgesampling
).
Best,
Quentin
References:
Joe, H. (2006). Generating random correlation matrices based on partial correlations. Journal of Multivariate Analysis, 97(10), 2177-2189. https://doi.org/10.1016/j.jmva.2005.05.010
Lewandowski, D., Kurowicka, D., & Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100(9), 1989-2001. https://doi.org/10.1016/j.jmva.2009.04.008