Slow Cholesky factor in Stan?

I was working on my sde / state space code today, and at some point became frustrated with Stan occasionally complaining about non spd matrices during Cholesky decomposition, even though the printed matrices could be decomposed in R. So I wrote a user defined Cholesky function, and was surprised at the fairly substantial speed up this gave to my models – given that the Cholesky decomp is just a modest piece of the puzzle. This is with smallish (2-6 dimensional) matrices, I didn’t test any further. I also don’t know why this is occurring – whether comp time or precision / gradients, or if something peculiar to the cases I tried – but thought I should mention it. This R script compares output of my function to the stan function.

sm <- '
functions{
  matrix chol(matrix a){
    matrix[rows(a),rows(a)] l;
    for(j in 1:cols(a)){
      for(i in 1:rows(a)){
        if(j==i) {
          if(j == 1){ 
            if(a[j,j] <=0) { 
              l[j,j] = 1e-8;
              print("Negative variance ", a[j,j], " set to 1e-8 for Cholesky decomp");
            }
            if(a[j,j] > 0) l[j,j] = sqrt(a[j,j]);
          }
          if(j > 1) l[j,j] = sqrt(a[j,j] - dot_self(l[i, 1 : j-1]));
        }
        if(i > j) l[i,j] = ( a[i,j] - dot_product( l[ i, 1:(j-1) ], l[j, 1:(j-1)]) ) / l[j,j];
        if(j > i) l[i,j] = 0;
      }
    }
    return l;
  }
}
data{
  int dim;
}
parameters{
  cov_matrix[dim] cmat;
}

model{
  print("cov = ", cmat);
  print("Chol diff = ",chol(cmat)-cholesky_decompose(cmat));
}'

stan(model_code = sm,data = list(dim=4),iter=1)
4 Likes

Hm, on my system and with larger matrices (100x100) in a gaussian process model, your custom function is much slower.

Note that getting rid of rejections might signal some better numerical properties of the model and can be the main contribution to the actual speedups (i.e. your code might be slower per evaluation than the Stan version but still model can run faster due to performing fewer evaluations).

I noticed most examples of cholesky in Stan include something like

    for (I in 1:N)
    {
      covariance[I, I] += 1e-12;
    }

prior to calling Cholesky. In my experience, this got rid of all the “non spd” rejections. How does your model behave if you do this tweak and use the default Stan Cholesky? I would expect a similar or even bigger speedup than from your custom function.

Might make sense to have this “diagonal tweak” a default part of Cholesky in Stan though…

Mike - happy to hear!
Martin - you may be right, I don’t exactly have a well developed test for this, but adding the diagonal did improve things with the standard function too.
Is there an efficient way to make a function for this? The below seems like it would induce quite some extra copying around…

 matrix cholspd(matrix a){
    matrix[rows(a),rows(a)] l;
    for(j in 1:cols(a)){
      for(i in 1:rows(a)){
        if(i != j) l[i,j] = a[i,j];
        if(j == i) l[j,j] = a[j,j] + 1e-12;
      }
    }
    return cholesky_decompose(l);
  }

AFAIK Stan does not have passing arguments by reference (and probably should not have as it is easy to get wrong). So I guess if you really want a function and you really need every tiny speed bump than you should go for a custom C++ function that gets its argument as a reference.

However, Stan performance is usually dominated by evaluating the gradient which should not be really affected by this copying around and so it will likely not matter much.

Also note that you can greatly simplify (and probably speedup) the function to something like:

 matrix cholspd(matrix a){
    matrix[rows(a),rows(a)] l = a;
    for(i in 1:rows(a)){
        l[i,i] += 1e-12;
    }
    return cholesky_decompose(l);
  }
1 Like

All arguments in Stan functions, both built-in and user-defined, are passed by constant reference in the underlyiung C++.

Having said that, there are plenty of places where we make internal copies of things, primarily in matrices with autodiff.

Sorry, I should have been more precise, I guess a better way to say it would be to just stick to “Stan does not allow you to modify function arguments”. I think you could still argue that the Stan language does not (and should not) have pass-by-reference semantics (although it is implemented with references), but that’s probably not very helpful to users :-)

It’s call by constant reference.

Right. Stan only lets you modify local variables and variables in blocks where they’re defined. That’s the “constant” part of the call.

The key thing is that derivatives, etc., are passed through function calls. That’s the pass-by-reference sense.

I don’t make the implications for derivatives clear anywhere in the language reference. I should make this much more clear in the next overhaul of the language reference. We’ll be working on a clearer Stan formal spec in a couple weeks.