-nan results with lkj_corr_cholesky

Hey all,

I’m seeing a weird issue where my Stan model fails to initialize with a very simple model:

parameters {
  cholesky_factor_corr[100] ccm;
model {
  ccm ~ lkj_corr_cholesky(1);

because lkj_corr_cholesky(1); on any of the constrained cholesky_factor_corr matrices returns -nan, I think because some of the entries in the constrained matrix end up being 0? Anyone seen anything like this? cc @bgoodri as the resident Cholesky Factor, maybe I’m doing something really dumb here.

The error messages:

Rejecting initial value:
  Log probability evaluates to log(0), i.e. negative infinity.
  Stan can't start sampling from this initial value.

Initialization between (-2, 2) failed after 100 attempts. 


For Cholesky factors that big, it is possible that the diagonal elements result in the square root of a slightly negative number when symbolically they should be slightly positive. If you search for the “onion method” on the Stan-users Google group, there is some code for a more numerically stable way to construct a Cholesky factor of a large correlation matrix.

1 Like

onion.stan (1.1 KB) is the 2020 version.


Thanks, this is great and solved my issue. Not sure how to mark both as the “solution,” haha.

This seems like a good use case for “submodels” @mgorinova @Matthijs.

1 Like

Maybe this should go in the manual somewhere?

1 Like

I went looking for the onion method in the google forums and I came across this discussion on hierarchically modeling correlation matrices. Which has been brought up again on these forums (by me Weighted correlation matrices and cholesky_factor_corr and others https://discourse.mc-stan.org/t/sum-of-cholesky-factors-vs-sum-of-covariance/16293.

The onion method is more promising since it is derived from unit vectors to the left of the diagonal on the Cholesky factor of a KxK correlation matrix, but there is no easy way in the Stan language to get as many unit vectors as there are countries for each value of k between 2 and K. Link to post.

Is this issue still an issue in current stan?

Also, I got a bunch of warnings when I tested the stan code about not filling in the L matrix. Not sure if that matters. I instantiated L as a zero matrix and I removed that nested loop. I didn’t see any performance improvements but I also didn’t extensively test.

data {
  int<lower = 0> K;
  real<lower = 0> eta;
transformed data {
  real<lower = 0> alpha = eta + (K - 2) / 2.0;
  vector<lower = 0>[K-1] shape1;
  vector<lower = 0>[K-1] shape2;

  shape1[1] = alpha;
  shape2[1] = alpha;
  for(k in 2:(K-1)) {
    alpha -= 0.5;
    shape1[k] = k / 2.0;
    shape2[k] = alpha;
parameters {
  row_vector[choose(K, 2) - 1]  l;         // do NOT init with 0 for all elements
  vector<lower = 0,upper = 1>[K-1] R2; // first element is not really a R^2 but is on (0,1)  
model {
  // priors that imply Sigma ~ lkj_corr(eta)
  l ~ std_normal();
  R2 ~ beta(shape1, shape2);
generated quantities {
  matrix[K,K] Sigma;
  matrix[K,K] L = rep_matrix(0, K, K);
    int start = 1;
    int end = 2;

    L[1,1] = 1.0;
    L[2,1] = 2.0 * R2[1] - 1.0;
    L[2,2] = sqrt(1.0 - square(L[2,1]));
    for(k in 2:(K-1)) {
      int kp1 = k + 1;
      row_vector[k] l_row = segment(l, start, k);
      real scale = sqrt(R2[k] / dot_self(l_row));
      L[kp1, 1:k] = l_row[1:k] * scale;
      L[kp1, kp1] = sqrt(1.0 - R2[k]);
      start = end + 1;
      end = start + k - 1;

    Sigma = multiply_lower_tri_self_transpose(L);

1 Like

I think multiply_lower_tri_self_transpose does not reference the upper triangular part, or if it does, it is a bug.

Yes, that’s correct. The code runs but gives warnings. Would this be a candidate for pedantic mode to tell users that this warning can be ignored if data are in the cells that the code or functions reference? Such as, "matrix is partially empty. This is fine if the code or functions only reference non-empty cells such as when referencing the lower triangular part of a matrix in multiply_lower_tri_self_transpose". @rok_cesnovar


Yeah, good suggestion. @rybern what do you think?

Is there some situation when there is a danger of this happening? Shouldn’t Stan’s default initialization provide random non-zero values pretty much always?

I’m not sure, that was in the original @bgoodri stan file.

It should be fine by default, but init = "0" is a supported argument.