Incorrect formula for correlation matrix transform

I am mapping a set of unconstrained values to a correlation matrix and working off 10.9 Correlation matrices | Stan Reference Manual in the manual (for which I am very grateful, as the LKJ papers are not at all easy to understand). However:

  • The second formula for w_{i, j} doesn’t seem correct, as it doesn’t return a Cholesky factor whose crossproduct is a correlation matrix.
  • I wasn’t sure if I had made a programming error, so I checked it against the first formula which does result in a correlation matrix.
  • In my attempt to figure out why the first formula wasn’t working for me, I also wrote down the actual transform that the Stan math library implements: math/cholesky_corr_constrain.hpp at 1d0b4a08b4a800a3e8d2c1a3559fbe9390b7584e · stan-dev/math · GitHub, which also results in a correlation matrix, but seems to be using a slightly different method from the one in the docs. I can see that the Stan math one is written to avoid an extra loop. But because the LKJ transformation is a bijection I think the fact it returns a different correlation matrix implies it is using a different method?.

Here is some code to demonstrate;

library(magrittr)
library(matrixcalc)

make_upper_diag_matrix <- function(x_compact, k) {
  z_mat <- matrix(data = 0, nrow = k, ncol = k)
  vec_counter <- 1
  
  for (ii in 1 : (k - 1)) {
    for (jj in (ii + 1) : k) {
      z_mat[ii, jj] <- x_compact[vec_counter]
      vec_counter <- vec_counter + 1
    }
  }
  
  return(z_mat)
}

# cholesky __upper__ triangular
# using the first formula for w_{i, j} from the manual
make_chol_from_cpc_first_formula <- function(z_cpc) {
  k <- ncol(z_cpc)
  w <- matrix(data = 0, nrow = k, ncol = k)
  
  w[1, 1] <- 1
  w[1, 2 : k] <- z_cpc[1, 2 : k]
  
  for (ii in 2 : k) {
    w[ii, ii] <- prod(sqrt(1 - z_cpc[1 : (ii - 1), ii]^2))
    if (ii == k) { next }
    
    for (jj in (ii + 1) : k) {
      w[ii, jj] <- z_cpc[ii, jj] * prod(sqrt(1 - z_cpc[1 : (ii - 1), jj]^2))
    }
  }
  
  return(crossprod(w))
}

# cholesky still  __upper__ triangular
# using the second formula for w_{i, j} from the manual
make_chol_from_cpc_second_formula <- function(z_cpc) {
  k <- ncol(z_cpc)
  w <- matrix(data = 0, nrow = k, ncol = k)
  
  w[1, 1] <- 1
  w[1, 2 : k] <- z_cpc[1, 2 : k]
  
  for (ii in 2 : k) {
    for (jj in ii : k) {
      term_one <- z_cpc[ii, jj] * w[ii - 1, jj]
      term_two <- sqrt(1 - (z_cpc[ii - 1, jj])^2)
      w[ii, jj] <- term_one * term_two
    }
  }
  
  return(crossprod(w))
}

# this is the math/rev/fun/cholesky_corr_constrain.hpp way of doing it
# note that the cholesky factor is now __lower__ triangular !!
make_corr_from_compact_math_lib <- function(x_compact, k) {
  w <- matrix(data = 0, nrow = k, ncol = k)
  counter <- 1
  w[1, 1] <- 1
  
  for (ii in 2 : k) {
    w[ii, 1] <- x_compact[counter]
    counter <- counter + 1
    temp_sum_square <- w[ii, 1]^2
    
    for (jj in 2 : ii) {
      if (jj == ii) { next }
      
      w[ii, jj] = x_compact[counter] * sqrt(1 - temp_sum_square)
      counter <- counter + 1
      temp_sum_square <- temp_sum_square + w[ii, jj]^2
    }
    
    w[ii, ii] <- sqrt(1 - temp_sum_square)
  }
  
  return(tcrossprod(w))
}
  
k <- 5 # dim of correlation matrix (k \times k)
n_unconstrained_pars <- choose(k, 2)
x_unconstrained <- rnorm(n = n_unconstrained_pars, mean = 0, sd = 0.5)

# first formulae
test_corr_first_formula <- x_unconstrained %>% 
  tanh() %>% 
  make_upper_diag_matrix(k = k) %>% 
  make_chol_from_cpc_first_formula()

# second formuale
test_corr_second_formula <-  x_unconstrained %>% 
  tanh() %>% 
  make_upper_diag_matrix(k = k) %>% 
  make_chol_from_cpc_second_formula()

test_corr_math_lib <- x_unconstrained %>% 
  tanh() %>% 
  make_corr_from_compact_math_lib(k = k)

matrixcalc::is.positive.definite(test_corr_first_formula)
#> [1] TRUE
matrixcalc::is.positive.definite(test_corr_second_formula) 
#> [1] FALSE
matrixcalc::is.positive.definite(test_corr_math_lib)
#> [1] TRUE

# first formula in docs -- is a correlation matrix
test_corr_first_formula
#>            [,1]        [,2]        [,3]        [,4]       [,5]
#> [1,]  1.0000000 -0.20286146 -0.57156605  0.12911546 -0.3833705
#> [2,] -0.2028615  1.00000000  0.03740415  0.09473022 -0.3934850
#> [3,] -0.5715660  0.03740415  1.00000000 -0.01102097  0.4231851
#> [4,]  0.1291155  0.09473022 -0.01102097  1.00000000  0.1251608
#> [5,] -0.3833705 -0.39348503  0.42318513  0.12516076  1.0000000

# second formula in docs -- is NOT a correlation matrix
test_corr_second_formula
#>            [,1]        [,2]        [,3]        [,4]        [,5]
#> [1,]  1.0000000 -0.20286146 -0.57156605  0.12911546 -0.38337051
#> [2,] -0.2028615  0.04115277  0.11594872 -0.02619255  0.07777110
#> [3,] -0.5715660  0.11594872  0.32878966 -0.07306701  0.22758037
#> [4,]  0.1291155 -0.02619255 -0.07306701  0.01692708 -0.04649934
#> [5,] -0.3833705  0.07777110  0.22758037 -0.04649934  0.18278035

# math lib implementation -- is a correlation matrix, but not the same as first
# formula
test_corr_math_lib
#>            [,1]        [,2]       [,3]        [,4]       [,5]
#> [1,]  1.0000000 -0.20286146 -0.5715660 -0.38337051 -0.5210758
#> [2,] -0.2028615  1.00000000  0.2196923 -0.01063644  0.1813084
#> [3,] -0.5715660  0.21969230  1.0000000  0.30269660  0.4837809
#> [4,] -0.3833705 -0.01063644  0.3026966  1.00000000  0.4346408
#> [5,] -0.5210758  0.18130844  0.4837809  0.43464084  1.0000000

identical(as.numeric(test_corr_first_formula), as.numeric(test_corr_math_lib))
#> [1] FALSE

Created on 2021-10-01 by the reprex package (v2.0.1)

My questions are:

2 Likes

I don’t have time to dive into this today but I remember that there was a question like this…

Does this thread answer your question?

2 Likes

Indeed that thread is the same question as I have, and I even remember reading it now you’ve linked it. That will teach me to search better. Happy to have this merged in under that thread.

The second last post gets at the difference between the math library and the first formula. I’ll see what I can figure out.

1 Like

Looks like the docs never got fixed. Do you want to take a crack at fixing it?

1 Like

Hoping to get some more time on this topic this week – If I figure out what the doc should be / what is actually going on with the code & math then I’ll send a docs PR.

1 Like

Hey @hhau sorry to bump an old topic but i’m wondering do you know where the code for the Jacobian adjustment for the math/rev/fun/cholesky_corr_constrain.hpp is? (or know what it is?)

It is the final formula of https://mc-stan.org/docs/reference-manual/cholesky-factors-of-correlation-matrices-1.html.

In the code the jacobian terms directly modify the lp variable. This happens in two places for this transformation, the first being: https://github.com/stan-dev/math/blob/develop/stan/math/rev/fun/cholesky_corr_constrain.hpp#L116 and line 124, which implements the second half of the formula in the docs. The second is https://github.com/stan-dev/math/blob/develop/stan/math/prim/fun/corr_constrain.hpp#L48, which happens when corr_constrain(y, lp) is called (which is tanh in disguise), and implements the first term in the docs.

Note that the docs use \frac{\text{d}}{\text{d}y} \text{tanh}(y) = \frac{1}{(\text{cosh}(y))^2} but the code uses \frac{\text{d}}{\text{d}y} \text{tanh}(y) = 1- (\text{tanh}(y))^2, but fortunately wikipedia tells us that these are the same: https://en.wikipedia.org/wiki/Hyperbolic_functions#Derivatives

2 Likes