-nan results with lkj_corr_cholesky

seantalts · July 21, 2020, 8:41pm

Hey all,

I’m seeing a weird issue where my Stan model fails to initialize with a very simple model:

parameters {
  cholesky_factor_corr[100] ccm;
}
model {
  ccm ~ lkj_corr_cholesky(1);
}

because lkj_corr_cholesky(1); on any of the constrained cholesky_factor_corr matrices returns -nan, I think because some of the entries in the constrained matrix end up being 0? Anyone seen anything like this? cc @bgoodri as the resident Cholesky Factor, maybe I’m doing something really dumb here.

The error messages:

Rejecting initial value:
  Log probability evaluates to log(0), i.e. negative infinity.
  Stan can't start sampling from this initial value.

Initialization between (-2, 2) failed after 100 attempts.

Thanks,
Sean

bgoodri · July 22, 2020, 12:56am

For Cholesky factors that big, it is possible that the diagonal elements result in the square root of a slightly negative number when symbolically they should be slightly positive. If you search for the “onion method” on the Stan-users Google group, there is some code for a more numerically stable way to construct a Cholesky factor of a large correlation matrix.

bgoodri · July 22, 2020, 1:35am

onion.stan (1.1 KB) is the 2020 version.

seantalts · July 22, 2020, 7:16pm

Thanks, this is great and solved my issue. Not sure how to mark both as the “solution,” haha.

This seems like a good use case for “submodels” @mgorinova @Matthijs.

jonah · July 22, 2020, 8:47pm

Maybe this should go in the manual somewhere?

spinkney · July 22, 2020, 10:16pm

I went looking for the onion method in the google forums and I came across this discussion on hierarchically modeling correlation matrices. Which has been brought up again on these forums (by me Weighted correlation matrices and cholesky_factor_corr - #7 by Joran and others https://discourse.mc-stan.org/t/sum-of-cholesky-factors-vs-sum-of-covariance/16293.

The onion method is more promising since it is derived from unit vectors to the left of the diagonal on the Cholesky factor of a KxK correlation matrix, but there is no easy way in the Stan language to get as many unit vectors as there are countries for each value of k between 2 and K. Link to post.

Is this issue still an issue in current stan?

Also, I got a bunch of warnings when I tested the stan code about not filling in the L matrix. Not sure if that matters. I instantiated L as a zero matrix and I removed that nested loop. I didn’t see any performance improvements but I also didn’t extensively test.

data {
  int<lower = 0> K;
  real<lower = 0> eta;
}
transformed data {
  real<lower = 0> alpha = eta + (K - 2) / 2.0;
  vector<lower = 0>[K-1] shape1;
  vector<lower = 0>[K-1] shape2;

  shape1[1] = alpha;
  shape2[1] = alpha;
  for(k in 2:(K-1)) {
    alpha -= 0.5;
    shape1[k] = k / 2.0;
    shape2[k] = alpha;
  }
}
parameters {
  row_vector[choose(K, 2) - 1]  l;         // do NOT init with 0 for all elements
  vector<lower = 0,upper = 1>[K-1] R2; // first element is not really a R^2 but is on (0,1)  
}
model {
  // priors that imply Sigma ~ lkj_corr(eta)
  l ~ std_normal();
  R2 ~ beta(shape1, shape2);
}
generated quantities {
  matrix[K,K] Sigma;
  matrix[K,K] L = rep_matrix(0, K, K);
  {
    int start = 1;
    int end = 2;

    L[1,1] = 1.0;
    L[2,1] = 2.0 * R2[1] - 1.0;
    L[2,2] = sqrt(1.0 - square(L[2,1]));
    for(k in 2:(K-1)) {
      int kp1 = k + 1;
      row_vector[k] l_row = segment(l, start, k);
      real scale = sqrt(R2[k] / dot_self(l_row));
      L[kp1, 1:k] = l_row[1:k] * scale;
      L[kp1, kp1] = sqrt(1.0 - R2[k]);
      start = end + 1;
      end = start + k - 1;

    }
    Sigma = multiply_lower_tri_self_transpose(L);
  }
}

bgoodri · July 23, 2020, 2:16am

I think multiply_lower_tri_self_transpose does not reference the upper triangular part, or if it does, it is a bug.

spinkney · July 23, 2020, 10:34am

Yes, that’s correct. The code runs but gives warnings. Would this be a candidate for pedantic mode to tell users that this warning can be ignored if data are in the cells that the code or functions reference? Such as, "matrix is partially empty. This is fine if the code or functions only reference non-empty cells such as when referencing the lower triangular part of a matrix in multiply_lower_tri_self_transpose". @rok_cesnovar

jonah · July 23, 2020, 3:27pm

Yeah, good suggestion. @rybern what do you think?

seantalts · July 23, 2020, 4:30pm

Is there some situation when there is a danger of this happening? Shouldn’t Stan’s default initialization provide random non-zero values pretty much always?

spinkney · July 23, 2020, 4:50pm

I’m not sure, that was in the original @bgoodri stan file.

bgoodri · July 24, 2020, 3:46pm

It should be fine by default, but init = "0" is a supported argument.

Topic		Replies	Views
Cholesky_factor_corr initializing with invalid values Developers	8	2881	July 14, 2021
Question on cholesky_factor_corr and the resulting correlation matrix Modeling	4	1081	January 8, 2020
Covariance issues with Dynamic Factor Model with large N Modeling covariance , time-series	7	567	January 17, 2024
Initialization Issues with Cholesky Factorization of Covariance Matrix Modeling performance , covariance , prior-choice	1	1005	April 25, 2020
Cholesky_decompose error Modeling fitting-issues	5	1186	May 24, 2019

-nan results with lkj_corr_cholesky

Related topics