Cholesky_decompose: Matrix m is not positive definite does not stop sampler in transformed parameter block

akcchoi · November 28, 2018, 9:39am

Hi all,

Is this expected that calling Cholesky_decompose with a non positive definite matrix in transformed parameter block would NOT cause error (but divergent transitions) while doing the same in generated quantities would cause error to be reported?

data {
  int N;

}

parameters {
  corr_matrix[N] Sigma;
  real x;
}

/*
transformed parameters {
  matrix[N,N] L_Sigma;

  {
    matrix[N,N] tmp;

    tmp = Sigma;
    tmp[1,2] = x;
    tmp[2,1] = x;
    
    L_Sigma = cholesky_decompose(tmp);

  }
}
*/

model {
  x ~ normal (0,1);
}

generated quantities {
  matrix[N,N] L_Sigma;
  
  {
    matrix[N,N] tmp;
    
    tmp = Sigma;
    tmp[1,2] = x;
    tmp[2,1] = x;
    
    L_Sigma = cholesky_decompose(tmp);
  }
}

in transformed parameter block:

> sampling (t807, chains=1, data=list(N=2, Sigma = matrix(c(1,1,2,1), nrow=2)))

SAMPLING FOR MODEL 't807' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 1.7e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.08869 seconds (Warm-up)
Chain 1:                0.073575 seconds (Sampling)
Chain 1:                0.162265 seconds (Total)
Chain 1: 
Inference for Stan model: t807.
1 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=1000.

              mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
Sigma[1,1]    1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
Sigma[1,2]   -0.01    0.03 0.58 -0.94 -0.51 -0.04  0.48  0.96   369    1
Sigma[2,1]   -0.01    0.03 0.58 -0.94 -0.51 -0.04  0.48  0.96   369    1
Sigma[2,2]    1.00    0.00 0.00  1.00  1.00  1.00  1.00  1.00   839    1
x             0.07    0.03 0.56 -0.91 -0.40  0.09  0.55  0.95   264    1
L_Sigma[1,1]  1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
L_Sigma[1,2]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_Sigma[2,1]  0.07    0.03 0.56 -0.91 -0.40  0.09  0.55  0.95   264    1
L_Sigma[2,2]  0.80    0.01 0.22  0.19  0.69  0.88  0.97  1.00   334    1
lp__         -0.77    0.06 0.90 -2.86 -0.96 -0.49 -0.23 -0.03   195    1

Samples were drawn using NUTS(diag_e) at Wed Nov 28 17:27:36 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
Warning messages:
1: There were 435 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: Examine the pairs() plot to diagnose sampling problems

In generated quantities block

<- stan_model ("t807.stan")
hash mismatch so recompiling; make sure Stan code ends with a blank line
During startup - Warning message:
Setting LC_CTYPE failed, using "C" 
> sampling (t807, chains=1, data=list(N=2, Sigma = matrix(c(1,1,2,1), nrow=2)))

SAMPLING FOR MODEL 't807' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 1.2e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Exception: cholesky_decompose: Matrix m is not positive definite  (in 'model18322d1bb4ec_t807' at line 42)

Chain 1: Exception: cholesky_decompose: Matrix m is not positive definite  (in 'model18322d1bb4ec_t807' at line 42)

Chain 1: Exception: cholesky_decompose: Matrix m is not positive definite  (in 'model18322d1bb4ec_t807' at line 42)

Chain 1: Exception: cholesky_decompose: Matrix m is not positive definite  (in 'model18322d1bb4ec_t807' at line 42)
...
...
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.027963 seconds (Warm-up)
Chain 1:                0.024731 seconds (Sampling)
Chain 1:                0.052694 seconds (Total)
Chain 1: 
[1] "Error in sampler$call_sampler(args_list[[i]]) : "                                                            
[2] "  Exception: cholesky_decompose: Matrix m is not positive definite  (in 'model18322d1bb4ec_t807' at line 42)"
error occurred during calling the sampler; sampling not done

mitzimorris · November 28, 2018, 5:19pm

it looks like this is the intended behavoir: https://github.com/stan-dev/stan/issues/2562

akcchoi · November 29, 2018, 9:13am

Thanks.

Two points/questions to follow up:

As it stand out of the box, having this error in transformed parameters block would not generate any warning/message/etc. Wouldn’t it be better to have at least some warning being printed to the console so that users can then easily identify (one of the) causes of divergent transitions was in fact issues with cholesky_decompose?
More generally, shouldn’t generated quantities and transform parameters behave largely the same?

mitzimorris · November 29, 2018, 1:19pm

yes! the current system doesn’t do a good job on handling error messages, and this is very much something that we want to fix as soon as we get the cycles to do so. as was noted in stan issue #2562, the lack of error message is bad.

yes and no. the transform parameters block is executed with every step. in particular, during initialization, failures shouldn’t be fatal - they might be due to the random choice of initial value for some parameter. generated quantities block is executed once per draw - at that point all parameters have been validated. if an error occurs in the generated quantities, it means some calculation is bad.

this is particularly annoying when trying to use the generated quantities block to generate replicate draws for posterior predictive checks - in particular, the poisson_log_rng function is pretty brittle. I’ve got code that looks like this:

generated quantities {
  vector[N] eta = log_E + beta0 + x * betas;
  vector[N] mu = exp(eta);
  int y_rep[N];
  if (max(eta) > 20) {
    // avoid overflow in poisson_log_rng
    print("max eta too big: ", max(eta));  
    for (n in 1:N)
      y_rep[n] = -1;
  } else {
      for (n in 1:N)
        y_rep[n] = poisson_log_rng(eta[n]);
  }
...

that’s some stinky code!

hope this helps, cheers,
Mitzi

Charles_Driver · November 30, 2018, 10:40am

This is not necessarily true. I face this issue sometimes, in that my parameters are ‘fine’ with respect to my observed data, but somewhere in the tails of generated data, a cholesky decomposition or two fails.

mitzimorris · November 30, 2018, 12:17pm

OK, so this is a problem with the Cholesky decomposition?
is this the same problem as discussed here

Charles_Driver · November 30, 2018, 12:21pm

I don’t know. My ‘seat of the pants’ experience suggests the cholesky factor in Stan is overly sensitive, in that I have the memory of copying matrices out that fail in stan which then decompose fine in R, but I think this is also just a general property of the cholesky decomp, that numerical issues are, well, an issue. From my perspective, improvement / more flexibility in this domain would be welcome — either a more robust cholesky decomp or the ability to skip symmetry tests etc. I suspect this is the same as is being discussed in the linked post.

sakrejda · November 30, 2018, 12:46pm

With the poisson rng the decision was that you can just check for the problematic parameter magnitude and Jack around it. This example makes a more convincing case that the generated quantities section should be changed. We could certainly make the decomposition more robust but that just makes the issue less frequent.

At this point the best practice is to move the problematic calculation to external code.

akcchoi · December 2, 2018, 10:53am

Thanks a lot!

Topic		Replies	Views
Cholesky_decompose runtime error when computing transformed parameters General	17	2192	August 12, 2018
matrix[K,K] vs cholesky_factor_cov[K] Modeling matrix	2	1139	May 26, 2021
Cholesky_decompose error Modeling fitting-issues	5	1186	May 24, 2019
Numerical stability of GPs with negative binomial likelihood Modeling fitting-issues	5	1283	January 4, 2022
Cholesky decomposition failed for precision/covariance matrix Modeling	21	5386	August 28, 2017

Cholesky_decompose: Matrix m is not positive definite does not stop sampler in transformed parameter block

Related topics