Cholesky_decompose runtime error when computing transformed parameters


#1

I’m getting cholesky decompose errors when sampling a large covariance matrix (~100 < rows < 1000) which is assembled with a kronecker product-like for loop.

The loop is similar to the gaussian process covariance matrix example on the Stan reference guide (without using cov_exp_quad), but loops over independent variable x[N] multiple times (M, which can vary from 2 to dozens) generating a covariance matrix of size MN \times MN. For M=2, for instance I have:

K = \begin{bmatrix} K_{11} & K_{21} \\ K_{12} & K_{22} \end{bmatrix}

The Stan model is compiling alright, but I am getting RuntimeError: Exception: cholesky_decompose: Matrix m is not positive definite. That is somewhat expected because a large matrix of this kind is very constrained, and the K_ij submatrices have positive diagonals and K may end up with an odd structure, I guess. But I am able to choose some parameters such that K is positive semidefinite, but from there it gets a little weirder, here are some observations:

[1]: I am testing the positive semidefiniteness for parameter values in Python before sampling (with a Python versionof the loop and numpy.linalg.cholesky) and I get no errors when computing L = cholesky(K).

[2]: if I pass this fixed K_{test} matrix as a Stan data dictionary and have Stan compute cholesky_decompose I get no errors (or sampling, of course) – if in parallel I also compute the real K matrix in Stan and print out the first terms they are the same as the python computed matrix in the data dict, printing out the differences in all terms doesn’t seem to have any differences greater than 1e-16, so I’m thinking there are no errors in building the matrix in Stan.

[3]: I can choose the initial parameters in a way that the assembled matrix is the identity matrix, for which I should have no problem, but I still get the same error.

[4]: Depending on the size of the matrix (depending on M), if I run an almost infinite loop catching the runtime error an trying again, I may be able to sample, but it’s kind of unpredictable and for large M I didn’t manage to sample after a few hours of looping (a few hundred to couple of thousand tries).

Finally I thought I was going crazy, went back and rewrote all Stan code from scratch based on the last similar example I had working and I found out the problems started when I moved these calculations from the model to transformed parameters block. Using algorithm=Fixed param I can run a single iteration with PyStan’s stan_model.sampling() a hundred times getting no errors for the former, but the latter gives me an error several times before I can sample a single time.

Is there anything I’m missing here that may be causing this behavior?


#2

Covariance matrices that large are often numerically indefinite for some values of the parameters. You could try qr_R().


#3

Yes, I may need to try something different at some point to deal with large covariance matrices, but the main issue is that it happens for things like the identity matrix, and for for fixed initial parameters it may or may not sample if the parameters are in the transformed parameters block, but seem to be fine when these computations are in the model block.


#4

Are you declaring the thing as a cholesky_factor_cov in the transformed parameters block? If so, just declare it as a matrix to avoid a redundant positive definiteness check.


#5

No, in the transformed parameters block I put together a symmetric matrix K based on the model parameters and compute L = cholesky_decompose(K), there’s not a lot more to it. For the first iteration with given initial values it is easy to check that the matrix is positive semidefinite (e.g. I with dimension anything 50 to 500), but I get an error 980 times out of 1000 if they are in the transformed parameters against zero if they are in the model block.


#6

Are you declaring the symmetric matrix as a cov_matrix in the transformed parameters block?


#7

No, just matrix[M*N, M*N] K. Is that a problem?


#8

No, that is the right way to do it if your matrix is positive definite by construction. Maybe Stan is handling exceptions in the transformed parameters block differently than in the model block. Anyway, try qr_R() and somehow deal with the diagonal elements that are non-positive.


#9

By construction all terms are always positive positive, and within a submatrix K_{ij} the decreases with distance to the diagonal. The easiest solution for me right now is to just use the model block and compute the intermediate quantities afterwards from the sampled parameters, but I may try to use qr_R() in the transformed parameters block and see if it works. Thanks.


#10

In practice one typically needs to add constants on the order of the square root of the machine precision, about 10^{-8} for double precision floating point, to the diagonals of the covariance matrix before taking its Cholesky decomposition.


#11

Yea, I’m doing that, the odd part is that the same identical matrix (except for differences in precision of the order of 1e-20) passed as a parameter will raise an error, and as a constant will not. Also the former will do so only in the transformed parameters block, but will run fine when in the model block.
Maybe there is a reason for handling exceptions differently in each block, but the behavior is inconsistent.