Cholesky decomposition failed for precision/covariance matrix

I am modeling multi normal distribution with precision matrix (which currently I’m using the inbuilt function multi_normal_prec which works fine but extremely slow with large number of variables). Here I am trying to use multi_normal_cholesky and use transfered parameter MM’ as the covariance matrix, from precision matrix LL’.

Part of my model looks like this (whole stan code attached in the end):

parameters {
  vector[D] logit_theta; //
 ....
 ....
 
  matrix[D,D] L;
}

transformed parameters {
  vector<lower=0,upper=1>[D] theta; //
  matrix[D,D] M;
  theta = inv_logit(logit_theta);

// Below: let MM' = (LL')^(-1) 
  M = cholesky_decompose(inverse(multiply_lower_tri_self_transpose(L))); 

 }

model {
  int count;
  count = 1;
  logit_theta ~ multi_normal_cholesky(rep_vector(0,D), M);
  ... 
  ... 
 //each elements in L is assigned separate priors below:
 for(j in 1:(D-1)){
    for(i in (j+1):D){
      L[i,j] ~ normal(0,square(sigma_lower_tri*tau_lower_tri[count]));
      count = count+1;
    }
  L[j,j] ~ normal(0,square(sigma_diag*tau_diag));
  }
 ...
 ...
}

When sampling, it gives error (only part of it is shown):

Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: cholesky_decompose: m is not symmetric. m[1,2] = 3.66101e+11, but m[2,1] = 3.661e+11 (in ‘model34f863bd86ee_sparse_network’ at line 25)

Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: cholesky_decompose: m is not symmetric. m[1,2] = -5.30476e+15, but m[2,1] = -5.30476e+15 (in ‘model34f863bd86ee_sparse_network’ at line 25)

Initialization between (-2, 2) failed after 100 attempts.

Seems that the issue is this line:
M = cholesky_decompose(inverse(multiply_lower_tri_self_transpose(L)));

But the error says the matrix is non-symmetric but to me it seems to be symmetric…? Anyone could suggest where the problem is…? Or is there an alternative way to make multi_normal_prec faster? Any advises would be appreciated! Thank you.

sparse_network.stan (1.1 KB)

Not really because multi_normal_prec is doing a Cholesky decomposition internally.

Good to know. Thank you Ben!

I changed my model accordingly but still have the same (?) issue with the cholesky decomposition…

transformed parameters {
      matrix[D,D] L_prec;
      matrix[D,D] L_cov;
      matrix[D,D] Sigma;
      ... 
// define L_prec elements...
      ... 
        Sigma = inverse_spd(multiply_lower_tri_self_transpose(L_prec));
        for (i in 1:D)
        Sigma[i, i] = Sigma[i, i] + 1e-12;
        L_cov = cholesky_decompose(Sigma);   //this is line 48 

    }

No warning in compiling, but when trying to sample, it gives this error:

Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: inverse_spd: m is not symmetric. m[1,2] = nan, but m[2,1] = nan (in ‘model109d04f6c390f_0821’ at line 48)

I am not sure why Sigma is not symmetric positive definite here… Any suggestion?

Without knowing details of your model it’s hard to say more, but sometime models that are SPD in exact arithmetic fail to have a Cholesky decomposition in floating point arithmetic.

It’s possible that adding 1e-12 is not enough. You can look at the detailed forwards error analysis of a cholesky decomposition (which will tell you you need to consider the condition number of Sigma), or you can do what the rest of us do and replace 1e-12 (which is far to small with either 1e-7 or 1e-5.

1 Like

also if you have the cholesky of the precision matrix, you could skip the multi_normal_prec and hand code it as

target += 2 * log(sum(diagonal(L))) - 0.5 * dot_self(L’*[whatever])

where [whatever] is the variable name for the multivariate normal with precision matrix Q= L*L’

Thank you Daniel!

It is a sparse precision matrix and that’s why I am using precision elements.
Here’s my original model ( I am adapting a lot from that horseshoe paper):

data {
      int<lower=0> N; 
      int<lower=0> D;
      int<lower=0,upper=1> Y[N,D];
      real<lower=0> scale_global;   // scale for the half-t prior for #tau#
      real<lower=1> nu_global;      // degrees of freedom for the half-t priors for #tau#
      real<lower=1> nu_local;       // degrees of freedom for the half-t priors for #lambdas# 
      real<lower=0> slab_scale;     // slab scale for the regularized horseshoe
      real<lower=0> slab_df;        // slab degrees of freedom for the regularized horseshoe
    }
    
    transformed data {
      int d;
      d = 0;
      for(i in 1:(D-1))
      d = d + i;
    }
    
    parameters {
      real<lower=0> tau; 
      vector<lower=0>[d] lambda; 
      real<lower=0> caux;
      vector[d] z;
      vector[D] eta;
      vector[D] epsilon;
    }
    
    transformed parameters {
      matrix[D,D] L_prec;
      matrix[D,D] L_cov;
      matrix[D,D] Sigma;
      vector<lower=0>[d] lambda_tilde; 
      real<lower=0> c;
      vector[D] logit_theta;
      {
        int count;
        count = 1;
        for(j in 1:(D-1)){
          for(i in (j+1):D){
            L_prec[i,j] = z[count] * lambda_tilde[count] * tau;
            count = count+1;
            }
         L_prec[j,j] = epsilon[j];
        }
        L_prec[D,D] = epsilon[D];
        lambda_tilde = sqrt( c^2 * square(lambda) ./ (c^2 + tau^2*square(lambda)) );
        c = slab_scale * sqrt(caux);
        Sigma = inverse_spd(multiply_lower_tri_self_transpose(L_prec));
        for (i in 1:D)
        Sigma[i, i] = Sigma[i, i] + 1e-5;
        L_cov = cholesky_decompose(Sigma);
        logit_theta = L_cov * eta;
      }
    }
    
    model {
      z ~ normal(0, 1);
      lambda ~ student_t(nu_local, 0, 1);
      tau ~ student_t(nu_global, 0, scale_global * 2);
      caux ~ inv_gamma(0.5 * slab_df, 0.5 * slab_df);
      eta ~ normal(0,1);
      for(i in 1:N){
        Y[i,] ~ bernoulli(exp(logit_theta));
      }
    }

Does that mean Sigma is not identifiable from the data I have?


So I followed this and this time getting different error on initialization…
Here is the stan code using target +=:

   data {
      int<lower=0> N; 
      int<lower=0> D;
      int<lower=0,upper=1> Y[N,D];
      real<lower=0> scale_global;   // scale for the half-t prior for #tau#
      real<lower=1> nu_global;      // degrees of freedom for the half-t priors for #tau#
      real<lower=1> nu_local;       // degrees of freedom for the half-t priors for #lambdas# 
      real<lower=0> slab_scale;     // slab scale for the regularized horseshoe
      real<lower=0> slab_df;        // slab degrees of freedom for the regularized horseshoe
    }
    
    transformed data {
      int d;
      d = 0;
      for(i in 1:(D-1))
      d = d + i;
    }
    
    parameters {
      real<lower=0> tau; 
      vector<lower=0>[d] lambda; 
      real<lower=0> caux;
      vector[d] z;
      vector[D] eta;
      vector[D] epsilon;
    }
    
    transformed parameters {
      matrix[D,D] L_prec;
      matrix[D,D] L_cov;
      matrix[D,D] Sigma;
      vector<lower=0>[d] lambda_tilde; 
      real<lower=0> c;
      vector[D] logit_theta;
      {
        int count;
        count = 1;
        for(j in 1:(D-1)){
          for(i in 1:j)
            L_prec[i,j] = 9999;
          for(i in (j+1):D){
            L_prec[i,j] = z[count]*lambda_tilde[count] * tau;
            count = count+1;
            }
         L_prec[j,j] = epsilon[j] + 1e-7;
        }
        L_prec[1,1] = epsilon[D] + 1e-7;
        L_prec[D,D] = epsilon[D] + 1e-7;
        lambda_tilde = sqrt( c^2 * square(lambda) ./ (c^2 + tau^2*square(lambda)) );
        c = slab_scale * sqrt(caux);
      }
    }
    
    model {
      z ~ normal(0, 1);
      lambda ~ student_t(nu_local, 0, 1);
      tau ~ student_t(nu_global, 0, scale_global*2);
      caux ~ inv_gamma(0.5 * slab_df, 0.5 * slab_df);
      eta ~ normal(0,1);
      target += 2*log(sum(diagonal(L_prec))) - 0.5*dot_self(L_prec'*logit_theta);
      for(i in 1:D)
      target += bernoulli_lpmf(Y[i,] | exp(logit_theta));
    }

And this time it says:

Rejecting initial value:
Error evaluating the log probability at the initial value.
validate transformed params: lambda_tilde[1] is nan, but must be greater than or equal to 0

is this because of the initialization of matrix? ( I changed all upper triangular elements to 9999, not sure if this is relevant though)


Thank you for the help!

You probably need to define c = slab_scale * sqrt(caux) first, then lamba_tilde =, then L_prec =.
According to the stan manual 2.16 p85, the order of assignments matters. So lambda_tilde[1] is nan at the point where you want to use it to calculate L_prec[1, 1].

logit_theta, Sigma, and L_cov are not longer defined in your last version.

I think the 9999 for the upper triangle are going to give you trouble in

target += 2*log(sum(diagonal(L_prec))) - 0.5*dot_self(L_prec'*logit_theta)

I believe they should be 0.

You haven’t set the values in the matrix to which you are applying inverse_spd—they are not-a-number.

This is the same problem. You need to set variables before using them.

I should also add that you might find it helpful to try to build up a little bit of your model at a time. Just build the covariance prior and draw from that, then go on to building the rest of the model, for instance. It’s much easier to isolate where a problem is that way rather than trying to deal with issues in the context of lots of other moving pieces.

1 Like

that was the problem, thank you for your help!

Y ~ Bernoulli (theta)
logit(theta) ~ multinormal(0, (L_prec*L_prec ^T)^(-1))
( – multinormal with L_prec as the cholesky decompose of precision matrix)

With this, I have my model block like this:

model {
    vector[D] logit_theta;
    z ~ normal(0, 1);
    lambda ~ cauchy(0,1);
    tau ~ normal(0, 0.5);
    for(i in 1:N){
       target += bernoulli_logit_lpmf(Y[i,] | logit_theta );
    }
    target += 2 * log(sum(diagonal(L_prec))) - 0.5 * dot_self(L_prec' * logit_theta);
  }

And this error:

Exception: bernoulli_logit_lpmf: Logit transformed probability parameter[1] is nan, but must not be nan!

So I now realized this:

But how can I define logit_theta in this case? It has the distribution, but is encoded in the log_posterier (target).

Anyone has suggestion? Thank you!!

Can you include the code for the data block and the parameters block (and any transformed blocks) as well?

One of the following two things should probably be true:

  • The matrix L_prec is data or transformed data
  • There is a prior on L_prec or a transformation of it.

The latter isn’t true in your current code.

Thank you Daniel. No, L_prec is in my transformed parameter block ( the precision matrix is actually the parameter of interest here … ) , and stan code :

data {
      int<lower=0> N; 
      int<lower=0> D;
      int<lower=0,upper=1> Y[N,D];
    }
    
transformed data {
      int d;
      d = 0;
      for(i in 1:(D-1))
      d = d + i;
    }
    
parameters {
      real<lower=0> tau; 
      vector<lower=0>[d] lambda; 
      vector[d] z;
    }
    
transformed parameters {
      matrix[D,D] L_prec;
      {
        int count;
        count = 1;
        for(j in 1:(D-1)){
          for(i in 1:j)
            L_prec[i,j] = 1e-5;
          for(i in (j+1):D){
            L_prec[i,j] = z[count] * lambda[count] * tau;
            count = count+1;
            }
        }
        for(i in 1:D)
        L_prec[i,D] = 1e-5;
      }
    }
    
model {
        vector[D] logit_theta;
        z ~ normal(0, 1);
        lambda ~ cauchy(0,1);
        tau ~ normal(0, 0.5);
        for(i in 1:N){
           target += bernoulli_logit_lpmf(Y[i,] | logit_theta );
        }
        target += 2 * log(sum(diagonal(L_prec))) - 0.5 * dot_self(L_prec' * logit_theta);
    }

What are you thinking that program will do?

What it will actually do is fail because you don’t set logit_theta before using it, so the values will all be NaN.

P.S. Y[i, ] is equivalent to just Y[i] in Stan.

Thank you for pointing out. Yes I understand this is not working ( I am still new to Stan and not very familiar with the syntax yet …)
I am asking if there is a way to sample from this multinormal distribution with known precision matrix, without converting precision matrix to covariance matrix and sample from that (I did not success in transferring and get symmetric matrix… )

Deleted. This was wrong

Just to confirm, you want the model
y | theta, L, … ~ bernoulli(theta)
logit(theta) | L, … ~ N(0, Q^{-1})
where Q = L’*L?

Because that’s the model you get if you put logit_theta in the parameter block

Yes…

Totally had no idea why it was originally put on the model block ( and I have been working with this model for a week not realizing this). The model is working now (except for possible divergences in the future).

Thank you all again for taking time to take look at my problem and answering questions, it was really helpful !!

1 Like

Not sure exactly what you want to do. There’s a multi_normal_precision distribution that takes the precision matrix parameterization. A precision matrix can be declared as cov_matrix—we should’ve just called that type spd_matrix, becuase it’s a symmetric positive definite type.

You’re much better off computationally

  • using a Cholesky-factor parameterization to cut down on the work in factoring the matrix for each log-density evaluation

  • using the _rng functions in the generated quantities block if you really only want to generate from the multivariate normal and not have it interact with the rest of your model