Cholesky_decompose: Matrix m is not positive definite does not stop sampler in transformed parameter block

Hi all,

Is this expected that calling Cholesky_decompose with a non positive definite matrix in transformed parameter block would NOT cause error (but divergent transitions) while doing the same in generated quantities would cause error to be reported?

data {
  int N;

}

parameters {
  corr_matrix[N] Sigma;
  real x;
}

/*
transformed parameters {
  matrix[N,N] L_Sigma;

  {
    matrix[N,N] tmp;

    tmp = Sigma;
    tmp[1,2] = x;
    tmp[2,1] = x;
    
    L_Sigma = cholesky_decompose(tmp);

  }
}
*/

model {
  x ~ normal (0,1);
}

generated quantities {
  matrix[N,N] L_Sigma;
  
  {
    matrix[N,N] tmp;
    
    tmp = Sigma;
    tmp[1,2] = x;
    tmp[2,1] = x;
    
    L_Sigma = cholesky_decompose(tmp);
  }
}

in transformed parameter block:

> sampling (t807, chains=1, data=list(N=2, Sigma = matrix(c(1,1,2,1), nrow=2)))

SAMPLING FOR MODEL 't807' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 1.7e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.08869 seconds (Warm-up)
Chain 1:                0.073575 seconds (Sampling)
Chain 1:                0.162265 seconds (Total)
Chain 1: 
Inference for Stan model: t807.
1 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=1000.

              mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
Sigma[1,1]    1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
Sigma[1,2]   -0.01    0.03 0.58 -0.94 -0.51 -0.04  0.48  0.96   369    1
Sigma[2,1]   -0.01    0.03 0.58 -0.94 -0.51 -0.04  0.48  0.96   369    1
Sigma[2,2]    1.00    0.00 0.00  1.00  1.00  1.00  1.00  1.00   839    1
x             0.07    0.03 0.56 -0.91 -0.40  0.09  0.55  0.95   264    1
L_Sigma[1,1]  1.00     NaN 0.00  1.00  1.00  1.00  1.00  1.00   NaN  NaN
L_Sigma[1,2]  0.00     NaN 0.00  0.00  0.00  0.00  0.00  0.00   NaN  NaN
L_Sigma[2,1]  0.07    0.03 0.56 -0.91 -0.40  0.09  0.55  0.95   264    1
L_Sigma[2,2]  0.80    0.01 0.22  0.19  0.69  0.88  0.97  1.00   334    1
lp__         -0.77    0.06 0.90 -2.86 -0.96 -0.49 -0.23 -0.03   195    1

Samples were drawn using NUTS(diag_e) at Wed Nov 28 17:27:36 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
Warning messages:
1: There were 435 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: Examine the pairs() plot to diagnose sampling problems

In generated quantities block

<- stan_model ("t807.stan")
hash mismatch so recompiling; make sure Stan code ends with a blank line
During startup - Warning message:
Setting LC_CTYPE failed, using "C" 
> sampling (t807, chains=1, data=list(N=2, Sigma = matrix(c(1,1,2,1), nrow=2)))

SAMPLING FOR MODEL 't807' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 1.2e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Exception: cholesky_decompose: Matrix m is not positive definite  (in 'model18322d1bb4ec_t807' at line 42)

Chain 1: Exception: cholesky_decompose: Matrix m is not positive definite  (in 'model18322d1bb4ec_t807' at line 42)

Chain 1: Exception: cholesky_decompose: Matrix m is not positive definite  (in 'model18322d1bb4ec_t807' at line 42)

Chain 1: Exception: cholesky_decompose: Matrix m is not positive definite  (in 'model18322d1bb4ec_t807' at line 42)
...
...
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.027963 seconds (Warm-up)
Chain 1:                0.024731 seconds (Sampling)
Chain 1:                0.052694 seconds (Total)
Chain 1: 
[1] "Error in sampler$call_sampler(args_list[[i]]) : "                                                            
[2] "  Exception: cholesky_decompose: Matrix m is not positive definite  (in 'model18322d1bb4ec_t807' at line 42)"
error occurred during calling the sampler; sampling not done

it looks like this is the intended behavoir: https://github.com/stan-dev/stan/issues/2562

Thanks.

Two points/questions to follow up:

  1. As it stand out of the box, having this error in transformed parameters block would not generate any warning/message/etc. Wouldn’t it be better to have at least some warning being printed to the console so that users can then easily identify (one of the) causes of divergent transitions was in fact issues with cholesky_decompose?

  2. More generally, shouldn’t generated quantities and transform parameters behave largely the same?

yes! the current system doesn’t do a good job on handling error messages, and this is very much something that we want to fix as soon as we get the cycles to do so. as was noted in stan issue #2562, the lack of error message is bad.

yes and no. the transform parameters block is executed with every step. in particular, during initialization, failures shouldn’t be fatal - they might be due to the random choice of initial value for some parameter. generated quantities block is executed once per draw - at that point all parameters have been validated. if an error occurs in the generated quantities, it means some calculation is bad.

this is particularly annoying when trying to use the generated quantities block to generate replicate draws for posterior predictive checks - in particular, the poisson_log_rng function is pretty brittle. I’ve got code that looks like this:

generated quantities {
  vector[N] eta = log_E + beta0 + x * betas;
  vector[N] mu = exp(eta);
  int y_rep[N];
  if (max(eta) > 20) {
    // avoid overflow in poisson_log_rng
    print("max eta too big: ", max(eta));  
    for (n in 1:N)
      y_rep[n] = -1;
  } else {
      for (n in 1:N)
        y_rep[n] = poisson_log_rng(eta[n]);
  }
...

that’s some stinky code!

hope this helps, cheers,
Mitzi

This is not necessarily true. I face this issue sometimes, in that my parameters are ‘fine’ with respect to my observed data, but somewhere in the tails of generated data, a cholesky decomposition or two fails.

OK, so this is a problem with the Cholesky decomposition?
is this the same problem as discussed here

I don’t know. My ‘seat of the pants’ experience suggests the cholesky factor in Stan is overly sensitive, in that I have the memory of copying matrices out that fail in stan which then decompose fine in R, but I think this is also just a general property of the cholesky decomp, that numerical issues are, well, an issue. From my perspective, improvement / more flexibility in this domain would be welcome — either a more robust cholesky decomp or the ability to skip symmetry tests etc. I suspect this is the same as is being discussed in the linked post.

With the poisson rng the decision was that you can just check for the problematic parameter magnitude and Jack around it. This example makes a more convincing case that the generated quantities section should be changed. We could certainly make the decomposition more robust but that just makes the issue less frequent.

At this point the best practice is to move the problematic calculation to external code.

Thanks a lot!