Question about LKJ normalizing constant

Quentin · September 24, 2017, 6:46pm

Hi,

I have a question about the LKJ normalizing constant.
As far as I understand it (but please correct me if this is wrong), when I use an LKJ(1.0) prior for a 2 by 2 correlation matrix, this implies that a uniform prior from -1 to 1 is placed on the correlation. Hence, naively, I would expect that the normalizing constant by which one needs to divide the density is equal to 2. On the log scale, this would imply substracting by log(2). However, when I sample in Stan from an LKJ(1.0) prior for a 2 by 2 correlation matrix once without the constant and once including the constant, it looks like instead of substracting by log(2), log(2) is added.

library(rstan)

#-------------------------------------------------------------------------------
# LKJ(1.0) for 2 x 2 correlation matrix
#-------------------------------------------------------------------------------

# without constant
code_m1 <- 
"data {
int<lower = 1> d;
}
parameters { 
corr_matrix[d] R; 
} 
model { 
R ~ lkj_corr(1.0);
}"

# with constant
code_m2 <- 
"data {
int<lower = 1> d;
}
parameters { 
corr_matrix[d] R; 
} 
model { 
target += lkj_corr_lpdf(R | 1.0);
}"

# compile models
m1 <- rstan::stan_model(model_code = code_m1)
m2 <- rstan::stan_model(model_code = code_m2)

# obtain samples
d <- 2
s1 <- rstan::sampling(m1, data = list(d = d), control = list(adapt_delta = 0.99), seed = 1)
s2 <- rstan::sampling(m2, data = list(d = d), control = list(adapt_delta = 0.99), seed = 1)

# compare lp__
d1 <- as.data.frame(s1)
d2 <- as.data.frame(s2)

diff_lp21 <- d2$lp__ - d1$lp__
all.equal(diff_lp21, rep(log(2), length(diff_lp21)))

In contrast, when I sample from a uniform prior from -1 to 1 once without the constant and once including the constant, as I would expect, log(2) is substracted.

library(rstan)

#-------------------------------------------------------------------------------
# uniform(-1,1)
#-------------------------------------------------------------------------------

# without constant
code_m3 <- 
"parameters { 
real<lower = -1.0, upper = 1.0> r; 
} 
model { 
r ~ uniform(-1.0, 1.0);
}"

# with constant
code_m4 <- 
"parameters { 
real<lower = -1.0, upper = 1.0> r; 
} 
model { 
target += uniform_lpdf(r | -1.0, 1.0);
}"

# compile models
m3 <- rstan::stan_model(model_code = code_m3)
m4 <- rstan::stan_model(model_code = code_m4)

# obtain samples
s3 <- rstan::sampling(m3, control = list(adapt_delta = 0.99), seed = 1)
s4 <- rstan::sampling(m4, control = list(adapt_delta = 0.99), seed = 1)

# compare lp__
d3 <- as.data.frame(s3)
d4 <- as.data.frame(s4)

diff_lp43 <- d4$lp__ - d3$lp__
all.equal(diff_lp43, rep(-log(2), length(diff_lp43)))

I also looked at the Lewandowski, Kurowicka, and Joe (2009) paper and, if I understand it correctly, in Theorem 5 the constant c_d is in the d = 2 case equal to 2. Furthermore, looking at the remaining notation in the paper, it appears that they do not divide the density by c_d but they multiply it by c_d (at least it looks like this in, e.g., Equation 15). Hence, what Stan does appears to be in line with the paper. Nevertheless, it is hard for me to understand why one would want to multiply and not divide by 2 in order to normalize the density in this case (but maybe I am simply misunderstanding something). What also adds to my confusion is that in Equation 16, they give the same expression for c_d which has been presented in Joe (2006, p. 2182), I think, however, in the Joe (2006) paper, the density appears to be divided and not multiplied by c_d.

So my question is: Am I misunderstanding the normalizing constant in the LKJ prior or is there genuinely something off with the _lpdf version of the LKJ prior?

In case you are wondering why I am interested in this, we need the correct density (i.e., including normalizing constant) for estimating marginal likelihoods (e.g., via bridgesampling).

Best,
Quentin

References:
Joe, H. (2006). Generating random correlation matrices based on partial correlations. Journal of Multivariate Analysis, 97(10), 2177-2189. https://doi.org/10.1016/j.jmva.2005.05.010

Lewandowski, D., Kurowicka, D., & Joe, H. (2009). Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100(9), 1989-2001. https://doi.org/10.1016/j.jmva.2009.04.008

Bob_Carpenter · September 24, 2017, 6:53pm

The log density you’re getting is for the uncontrained parameters. So there’s a Jacobian adjustment included as detailed in the Stan manual chapter on transforms for constrained parameters. In this case, the adjustment is going from (-infinity, infinity) to (-1, 1) so that the latter is uniform. We use 2 * inv_logit(x) - 1 for the inverse transform here, so there is the log of that derivative in the Jacobian.

You should be getting a uniform distribution over the correlation matrices, so I’d think that’d be a uniform distribution over the correlation. @bgoodri is the one to ask. This kind of sign/inversion detail is easy to get wrong in both papers and code.

bgoodri · September 24, 2017, 7:06pm

Quentin:

I also looked at the Lewandowski, Kurowicka, and Joe (2009) paper and, if I understand it correctly, in Theorem 5 the constant c_d is in the d = 2 case equal to 2. Furthermore, looking at the remaining notation in the paper, it appears that they do not divide the density by c_d but they multiply it by c_d (at least it looks like this in, e.g., Equation 15). Hence, what Stan does appears to be in line with the paper. Nevertheless, it is hard for me to understand why one would want to multiply and not divide by 2 in order to normalize the density in this case (but maybe I am simply misunderstanding something). What also adds to my confusion is that in Equation 16, they give the same expression for c_d which has been presented in Joe (2006, p. 2182), I think, however, in the Joe (2006) paper, the density appears to be divided and not multiplied by c_d.

Yeah, it multiplicative terms I think the constant should be (in R code)

i <- 1:(d - 1L)
prod(gamma(eta + (d - 1L) / 2) / (pi^(i/2) * gamma(eta + (d - i - 1L) / 2)))

which LKJ gets right in section 3.3 but calls it the reciprocal normalizing constant.

Quentin · September 25, 2017, 12:05pm

Thank you for your responses. Using the R code you sent, I obtain for the eta = 1 and d = 2 case 0.5 as the multiplicative constant.
If I understand it correctly, that means that one needs to multiply the unnormalized density by 0.5 in order to normalize it. On the log scale, this means that one needs to subtract log(2) from the log of the unnormalized density.

However, as shown in my previous code, it appears that Stan currently adds log(2) rather than subtracts it for the LKJ. This can be also seen when comparing the log_probs (on the constrained space via adjust_transform = FALSE) from the models I pasted in my first message. Note that for the uniform case everything appears to be correct.

## LKJ without constant:
log_prob(s1, unconstrain_pars(s1, list(R = diag(2))), adjust_transform = FALSE)
## [1] 0
## uniform without constant:
log_prob(s3, unconstrain_pars(s3, list(r = 0)), adjust_transform = FALSE)
## [1] 0

## LKJ with constant:
log_prob(s2, unconstrain_pars(s2, list(R = diag(2))), adjust_transform = FALSE)
## [1] 0.6931472
## uniform with constant:
log_prob(s4, unconstrain_pars(s4, list(r = 0)), adjust_transform = FALSE)
## [1] -0.6931472

Do you agree with my interpretation that the current Stan behavior is not correct?

bgoodri · September 25, 2017, 1:18pm

Yes

Bob_Carpenter · September 27, 2017, 6:30pm

This is being fixed. @bgoodri already has a patch for Stan math which will then percolate through Stan and the interfaces.

Quentin · September 27, 2017, 6:45pm

Great, thanks a lot for fixing this so quickly!

Charles_Driver · September 29, 2017, 10:48am

Can anyone point out what the likely implications of correlations fit in the presence of this bug would be?

bgoodri · September 29, 2017, 12:48pm

None, unless you are estimating the shape parameter of the LKJ distribution or evaluating the marginal likelihood.

Quentin · October 2, 2017, 3:18pm

Hi,

Another thing that I noticed is that the log density obtained with the _lpdf version of the LKJ prior appears to be different when one specifies lkj_corr_lpdf(R | 1) and lkj_corr_lpdf(R | 1.0). The Stan manual of course clearly mentions that eta needs to be a real but for people used to R, this may still be something to be careful about. I am sorry in case this is not relevant anymore since the patch will also fix this or in case there is a good reason why using 1 instead of 1.0 for eta should yield a different log density. Below is code which shows the difference.

library(rstan)

## LKJ with parameter eta = 1
code_m1 <- 
"data {
int<lower = 1> d;
}
parameters { 
corr_matrix[d] R; 
} 
model { 
target += lkj_corr_lpdf(R | 1);
}"

## LKJ with parameter eta = 1.0
code_m2 <- 
"data {
int<lower = 1> d;
}
parameters { 
corr_matrix[d] R; 
} 
model { 
target += lkj_corr_lpdf(R | 1.0);
}"

# compile models
m1 <- rstan::stan_model(model_code = code_m1)
m2 <- rstan::stan_model(model_code = code_m2)

# obtain samples
d <- 2
s1 <- rstan::sampling(m1, data = list(d = d), control = list(adapt_delta = 0.99), seed = 1)
s2 <- rstan::sampling(m2, data = list(d = d), control = list(adapt_delta = 0.99), seed = 1)

## LKJ with parameter eta = 1
rstan::log_prob(s1, unconstrain_pars(s1, list(R = diag(2))), adjust_transform = FALSE)
## [1] 0
## LKJ with parameter eta = 1.0
rstan::log_prob(s2, unconstrain_pars(s2, list(R = diag(2))), adjust_transform = FALSE)
## [1] 0.6931472

bgoodri · October 2, 2017, 4:40pm

It wasn’t fixed but is now.

Henrik_Singmann · October 2, 2017, 4:52pm

Is there an expected timeline when we will be able to use the fixed version of the LKJ prior (I mean the fixed constant) in rstan (including the github version)? Or is there a specific repository that already contains the fix?

From what I can deduce when looking at rstan or math on the stan-dev github it seems to have not yet arrived there.

bgoodri · October 2, 2017, 4:54pm

It is in https://github.com/stan-dev/math/pull/628

Henrik_Singmann · October 3, 2017, 2:25pm

I guess this probably does not have the highest priority, but I am unable to install this version in R from github using an updated version of the install_StanHeaders.R install script:

path_rstan <- tempfile(pattern = "git2r-")
git2r::clone("http://github.com/stan-dev/rstan", path_rstan, branch = "develop")
git2r::clone("http://github.com/stan-dev/stan", 
             file.path(path_rstan, "StanHeaders", "inst", "include", "upstream"), 
             branch = "develop") # may want to change this branch
git2r::clone("http://github.com/stan-dev/math",
             file.path(path_rstan, "StanHeaders", "inst", "include", "mathlib"),
             branch = "issue-299") # may want to change this branch
devtools::install(file.path(path_rstan, "StanHeaders"), args = "--preclean")
devtools::install(file.path(path_rstan, "rstan/rstan"), args = "--preclean")

To be precise, installation works, but when running a simple rstan call it fails with:

In file included from D:/Henrik Singmann/R/win-library/3.4/StanHeaders/include/stan/math/rev/mat.hpp:59:0,
                 from D:/Henrik Singmann/R/win-library/3.4/StanHeaders/include/stan/math.hpp:4,
                 from filec05c7b4c49.cpp:30:
D:/Henrik Singmann/R/win-library/3.4/StanHeaders/include/stan/math/rev/mat/functor/cvodes_utils.hpp:4:27: fatal error: cvodes/cvodes.h: No such file or directory
 #include <cvodes/cvodes.h>
                           ^

This also happens if I set the math branch to develop.

Is there maybe a simple fix for this or do I have wait until the correct version is on CRAN? I am asking, because I would like to use it for an ongoing project.

andrjohns · November 18, 2017, 5:09am

I just ran into this problem building rstan on windows, a quick workaround is to use the .R Makevars file to include the cvodes files while compiling.

In the .R\Makevars file that you created during installation add:

-I"path\to\R\library\StanHeaders\include\src\cvodes\include"

to the CXXFLAGS= line, and it should compile and run as usual

Henrik_Singmann · January 3, 2018, 7:02pm

It appears that even with the latest version of rstan and StanHeaders (released just prior to Christmas 2017), the bug persists.

Below is the reproducible example. I tried it on both Linux and Windows, the SessionInfo() for Windows is given below as well.

Unfortunately the way around described above (using git2r and branch = "issue-299") does not work anymore (and also never worked for me on Linux). So is there any chance for a fix of this bug that can actually be used? I would be really happy as I have a few projects that are on hold since quite a while just because of this bug.

library(rstan)

#-------------------------------------------------------------------------------
# LKJ(1.0) for 2 x 2 correlation matrix
#-------------------------------------------------------------------------------

# with constant
code_m2 <- 
"
parameters { 
corr_matrix[2] R; 
} 
model { 
target += lkj_corr_lpdf(R | 1.0);
}"

m2 <- rstan::stan_model(model_code = code_m2)
s2 <- rstan::sampling(m2, chains = 0)


#-------------------------------------------------------------------------------
# uniform(-1,1)
#-------------------------------------------------------------------------------

# with constant
code_m4 <- 
"parameters { 
real<lower = -1.0, upper = 1.0> r; 
} 
model { 
target += uniform_lpdf(r | -1.0, 1.0);
}"

# compile models
m4 <- rstan::stan_model(model_code = code_m4)
s4 <- rstan::sampling(m4, chains = 0)

## LKJ with constant:
log_prob(s2, unconstrain_pars(s2, list(R = diag(2))), adjust_transform = FALSE)
## [1] 0.6931472
## uniform with constant:
log_prob(s4, unconstrain_pars(s4, list(r = 0)), adjust_transform = FALSE)
## [1] -0.6931472

Session info:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstan_2.17.2       StanHeaders_2.17.1 ggplot2_2.2.1     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14     codetools_0.2-15 grid_3.4.3       plyr_1.8.4      
 [5] gtable_0.2.0     stats4_3.4.3     scales_0.5.0     pillar_1.0.1    
 [9] rlang_0.1.6      lazyeval_0.2.1   tools_3.4.3      munsell_0.4.3   
[13] yaml_2.1.16      compiler_3.4.3   inline_0.3.14    colorspace_1.3-2
[17] gridExtra_2.3    tibble_1.4.1

Bob_Carpenter · January 3, 2018, 7:51pm

@bgoodri or @jonah — any idea what’s going on with this? Is this RStan only?

Henrik_Singmann · January 3, 2018, 9:39pm

I believe I have found a temporary solution by simply exchanging the existing lkj_corr_lpdf.hpp in StanHeaders with the latest one from github. A repackaged version of StanHeaders (renamed 2.17.2) that uses this version can be found here: http://singmann.org/download/r/StanHeaders_2.17.2.tar.gz

Or simply within R:

install.packages("http://singmann.org/download/r/StanHeaders_2.17.2.tar.gz", repos = NULL)

With this version, the results are as expected:

library(rstan)
code_m2 <- 
"
parameters { 
corr_matrix[2] R; 
} 
model { 
target += lkj_corr_lpdf(R | 1.0);
}"
m2 <- rstan::stan_model(model_code = code_m2, verbose = FALSE)
s2 <- rstan::sampling(m2, chains = 0)
log_prob(s2, unconstrain_pars(s2, list(R = diag(2))), adjust_transform = FALSE)
## [1] -0.6931472

bgoodri · January 3, 2018, 9:55pm

Yes that should work. The StanHeaders on CRAN now only fixed the slowdown due to the char* vs. string thing.

Henrik_Singmann · February 5, 2018, 3:18pm

The latest version of StanHeaders which was released on January 20th still shows the same bug!

## LKJ with constant:
log_prob(s2, unconstrain_pars(s2, list(R = diag(2))), adjust_transform = FALSE)
## [1] 0.6931472
## uniform with constant:
log_prob(s4, unconstrain_pars(s4, list(r = 0)), adjust_transform = FALSE)
## [1] -0.6931472

I do understand now that this issue is not of highest priority to the Stan maintainers, but given that the bug was first reported in September and a bug fix is part of upstream at least since October, it is a little bit disappointing that the fix still does not make its way into the CRAN version.

Is there anything I can do to help that the next CRAN version finally fixes this bug?

Topic		Replies	Views
Dealing with constraints on correlations Modeling	2	860	November 2, 2019
Half-normal priors for sigma (sd) in hierarquical analysis Modeling rstan , specification	2	85	February 20, 2025
Logspline density estimation Modeling	3	1059	February 9, 2019
Target += vs. sampling: does dropping constants preserve sampling behavior? General	5	498	October 13, 2022
Specifying Correlation in Stan Modeling	10	1035	October 9, 2021

Question about LKJ normalizing constant

Related topics