Cholesky_decompose runtime error when computing transformed parameters

I’m getting cholesky decompose errors when sampling a large covariance matrix (~100 < rows < 1000) which is assembled with a kronecker product-like for loop.

The loop is similar to the gaussian process covariance matrix example on the Stan reference guide (without using cov_exp_quad), but loops over independent variable x[N] multiple times (M, which can vary from 2 to dozens) generating a covariance matrix of size MN \times MN. For M=2, for instance I have:

K = \begin{bmatrix} K_{11} & K_{21} \\ K_{12} & K_{22} \end{bmatrix}

The Stan model is compiling alright, but I am getting RuntimeError: Exception: cholesky_decompose: Matrix m is not positive definite. That is somewhat expected because a large matrix of this kind is very constrained, and the K_ij submatrices have positive diagonals and K may end up with an odd structure, I guess. But I am able to choose some parameters such that K is positive semidefinite, but from there it gets a little weirder, here are some observations:

[1]: I am testing the positive semidefiniteness for parameter values in Python before sampling (with a Python versionof the loop and numpy.linalg.cholesky) and I get no errors when computing L = cholesky(K).

[2]: if I pass this fixed K_{test} matrix as a Stan data dictionary and have Stan compute cholesky_decompose I get no errors (or sampling, of course) – if in parallel I also compute the real K matrix in Stan and print out the first terms they are the same as the python computed matrix in the data dict, printing out the differences in all terms doesn’t seem to have any differences greater than 1e-16, so I’m thinking there are no errors in building the matrix in Stan.

[3]: I can choose the initial parameters in a way that the assembled matrix is the identity matrix, for which I should have no problem, but I still get the same error.

[4]: Depending on the size of the matrix (depending on M), if I run an almost infinite loop catching the runtime error an trying again, I may be able to sample, but it’s kind of unpredictable and for large M I didn’t manage to sample after a few hours of looping (a few hundred to couple of thousand tries).

Finally I thought I was going crazy, went back and rewrote all Stan code from scratch based on the last similar example I had working and I found out the problems started when I moved these calculations from the model to transformed parameters block. Using algorithm=Fixed param I can run a single iteration with PyStan’s stan_model.sampling() a hundred times getting no errors for the former, but the latter gives me an error several times before I can sample a single time.

Is there anything I’m missing here that may be causing this behavior?

Covariance matrices that large are often numerically indefinite for some values of the parameters. You could try qr_R().

Yes, I may need to try something different at some point to deal with large covariance matrices, but the main issue is that it happens for things like the identity matrix, and for for fixed initial parameters it may or may not sample if the parameters are in the transformed parameters block, but seem to be fine when these computations are in the model block.

Are you declaring the thing as a cholesky_factor_cov in the transformed parameters block? If so, just declare it as a matrix to avoid a redundant positive definiteness check.

No, in the transformed parameters block I put together a symmetric matrix K based on the model parameters and compute L = cholesky_decompose(K), there’s not a lot more to it. For the first iteration with given initial values it is easy to check that the matrix is positive semidefinite (e.g. I with dimension anything 50 to 500), but I get an error 980 times out of 1000 if they are in the transformed parameters against zero if they are in the model block.

Are you declaring the symmetric matrix as a cov_matrix in the transformed parameters block?

No, just matrix[M*N, M*N] K. Is that a problem?

No, that is the right way to do it if your matrix is positive definite by construction. Maybe Stan is handling exceptions in the transformed parameters block differently than in the model block. Anyway, try qr_R() and somehow deal with the diagonal elements that are non-positive.

By construction all terms are always positive positive, and within a submatrix K_{ij} the decreases with distance to the diagonal. The easiest solution for me right now is to just use the model block and compute the intermediate quantities afterwards from the sampled parameters, but I may try to use qr_R() in the transformed parameters block and see if it works. Thanks.

In practice one typically needs to add constants on the order of the square root of the machine precision, about 10^{-8} for double precision floating point, to the diagonals of the covariance matrix before taking its Cholesky decomposition.

Yea, I’m doing that, the odd part is that the same identical matrix (except for differences in precision of the order of 1e-20) passed as a parameter will raise an error, and as a constant will not. Also the former will do so only in the transformed parameters block, but will run fine when in the model block.
Maybe there is a reason for handling exceptions differently in each block, but the behavior is inconsistent.

Exceptions in transformed data kill the whole process as there’s no way to recover.

Excpetions in generated quantities used to kill the whole process, but we’re going to move to have them generate NaN for all real quantities instead as it was sometimes crashing during warmup due to numerics.

Exceptions in transformed parameters and the model block throw from exactly the same scope in the generated code, so the behavior should be identical for a variable declared as matrix.

If a variable is declared with a constrained type (e.g., cov_matrix), then the constraint is validated at the end of the block in which the variable is declared.

If you have an example that’s inconsistent with that behavior, it’s a bug, not intended behavior, and we’d greatly appreciate an example.

2 Likes

I just realized we may be automatically checking variables at the end of the transformed parameters block to make sure they’re not not-a-number values. We probably shouldn’t be doing that if we are as there are cases where people might want to pass NaN around.

Hi. Sorry for the delay, I was away from Bayesworld for a while. It shouldn’t be the latter, since I’m setting up the initial covariance matrix in Python and passing it to Stan, there are no NaNs there. The function setting up the covariance matrix is one I posted as a reply for another thread, in any case it’s below:

functions{
matrix_K(vector x, real[] hyperpars, matrix signalCovarianceMatrix, int N, int M, real jitter) {
matrix[M*N, M*N] K;
int sub_l;
int sub_k;

for (l in 1:M) {
    // possibly use specific hyperparameters for task l
    sub_l = (l-1)*N;
    for (k in 1:M) {
         // possibly use specific hyperparameters for task k
        sub_k = (k-1)*N;
        for (i in 1:N) {
            for (j in i:N) {
                K[sub_l+i, sub_k+j] = signalCovarianceMatrix[l,k] * cov_func(i,j,1,1);
                K[sub_k+j, sub_l+i] = K[sub_l+i, sub_k+j];
                if (i==j) {K[sub_l+i, sub_k+j] = K[sub_l+i, sub_k+j] + jitter}
            }
        }
    }
}
}

double cov_func(real xi, real xj, real elll, real ellk){
    cov = exp(-pow(abs(i-j,2)/(pow(elll,2)+pow(elll,2)))

}
}
transformed data {
    real jitter = 1e-8;
    int M = 30
    int N = 15
    vector x = [1..15]
}
parameters {

    cov_matrix[M,M] signalCovarianceMatrix // alternatively just 'matrix' being put together by a function that assures it's symmetric

    real hyperpars[M]; 


}
model{
    matrix[M*N, M*N] K = matrix_K(x, hyperpars, signalCovarianceMatrix, M, N, jitter);

    matrix[M*N, M*N] L = cholesky_decompose(K);
}

from Python, making the initial value of K = np.ones([M*N,M*N])*1e-3; for i in range(M*N): K[i,i] = (1+1e-8); should reproduce the error when the cholesky_decompose is in the transformed parameters block instead of model. I didn’t test this code and I can’t rerun the original model, but unless there’s something really weird there should be no difference except for the matrix/cov_matrix, but because it fails before the first iterations I don’t see why it should make a difference either. I could also put together a full example, but that may take longer to get done and test.

I’ve lost the thread after this long. What exactly are you expecting from this? It’s not well-formed Stan code, so I’m not going to be able to run it without fixing hte syntax.

Sorry, like I said I can try to make it runnable as is, but testing it completely also requires some python input to pass it as initial condition versus fixed parameter, and then move it from model to transformed parameters to reproduce the error.
If that’s the best way for you to find why the behaviour between the two blocks is different I can put it together sometime soon.

Here’s a minimal example of the problem you might be having.

data {
  int<lower = 0, upper = 1> in_tp;
}
parameters {
  real y;
}
transformed parameters {
  cov_matrix[1] a = [[ in_tp ? not_a_number() : 1 ]];
}
model {
  matrix[1, 1] b = [[ in_tp ? 1 : not_a_number() ]];

  y ~ normal(0, 1);
}

If you set in_tp to `, it fails to initialize; if you set it to 0, it’s fine in the model.

If you replace cov_matrix[1] with matrix[1, 1] in the transformed parameters, then it’s OK, but it gives you an error trying to load the draws into R.

1 Like

I’ll try that, but I don’t think it’s a NaN problem. I may be wrong, so I’ll check. If not I’ll try to get a fully working Stan code and pystan inits so you can reproduce the error exactly as I’m seeing it. Thanks a lot again, I don’t know how you guys do all of this here.