Slow Cholesky factor in Stan?

Charles_Driver · April 15, 2018, 12:28am

I was working on my sde / state space code today, and at some point became frustrated with Stan occasionally complaining about non spd matrices during Cholesky decomposition, even though the printed matrices could be decomposed in R. So I wrote a user defined Cholesky function, and was surprised at the fairly substantial speed up this gave to my models – given that the Cholesky decomp is just a modest piece of the puzzle. This is with smallish (2-6 dimensional) matrices, I didn’t test any further. I also don’t know why this is occurring – whether comp time or precision / gradients, or if something peculiar to the cases I tried – but thought I should mention it. This R script compares output of my function to the stan function.

sm <- '
functions{
  matrix chol(matrix a){
    matrix[rows(a),rows(a)] l;
    for(j in 1:cols(a)){
      for(i in 1:rows(a)){
        if(j==i) {
          if(j == 1){ 
            if(a[j,j] <=0) { 
              l[j,j] = 1e-8;
              print("Negative variance ", a[j,j], " set to 1e-8 for Cholesky decomp");
            }
            if(a[j,j] > 0) l[j,j] = sqrt(a[j,j]);
          }
          if(j > 1) l[j,j] = sqrt(a[j,j] - dot_self(l[i, 1 : j-1]));
        }
        if(i > j) l[i,j] = ( a[i,j] - dot_product( l[ i, 1:(j-1) ], l[j, 1:(j-1)]) ) / l[j,j];
        if(j > i) l[i,j] = 0;
      }
    }
    return l;
  }
}
data{
  int dim;
}
parameters{
  cov_matrix[dim] cmat;
}

model{
  print("cov = ", cmat);
  print("Chol diff = ",chol(cmat)-cholesky_decompose(cmat));
}'

stan(model_code = sm,data = list(dim=4),iter=1)

mike-lawrence · April 15, 2018, 10:56pm

Hm, on my system and with larger matrices (100x100) in a gaussian process model, your custom function is much slower.

martinmodrak · April 17, 2018, 8:03am

Note that getting rid of rejections might signal some better numerical properties of the model and can be the main contribution to the actual speedups (i.e. your code might be slower per evaluation than the Stan version but still model can run faster due to performing fewer evaluations).

I noticed most examples of cholesky in Stan include something like

    for (I in 1:N)
    {
      covariance[I, I] += 1e-12;
    }

prior to calling Cholesky. In my experience, this got rid of all the “non spd” rejections. How does your model behave if you do this tweak and use the default Stan Cholesky? I would expect a similar or even bigger speedup than from your custom function.

Might make sense to have this “diagonal tweak” a default part of Cholesky in Stan though…

Charles_Driver · April 17, 2018, 10:59am

Mike - happy to hear!
Martin - you may be right, I don’t exactly have a well developed test for this, but adding the diagonal did improve things with the standard function too.
Is there an efficient way to make a function for this? The below seems like it would induce quite some extra copying around…

 matrix cholspd(matrix a){
    matrix[rows(a),rows(a)] l;
    for(j in 1:cols(a)){
      for(i in 1:rows(a)){
        if(i != j) l[i,j] = a[i,j];
        if(j == i) l[j,j] = a[j,j] + 1e-12;
      }
    }
    return cholesky_decompose(l);
  }

martinmodrak · April 17, 2018, 11:14am

AFAIK Stan does not have passing arguments by reference (and probably should not have as it is easy to get wrong). So I guess if you really want a function and you really need every tiny speed bump than you should go for a custom C++ function that gets its argument as a reference.

However, Stan performance is usually dominated by evaluating the gradient which should not be really affected by this copying around and so it will likely not matter much.

Also note that you can greatly simplify (and probably speedup) the function to something like:

 matrix cholspd(matrix a){
    matrix[rows(a),rows(a)] l = a;
    for(i in 1:rows(a)){
        l[i,i] += 1e-12;
    }
    return cholesky_decompose(l);
  }

Bob_Carpenter · April 18, 2018, 10:42pm

All arguments in Stan functions, both built-in and user-defined, are passed by constant reference in the underlyiung C++.

Having said that, there are plenty of places where we make internal copies of things, primarily in matrices with autodiff.

martinmodrak · April 19, 2018, 7:48am

Sorry, I should have been more precise, I guess a better way to say it would be to just stick to “Stan does not allow you to modify function arguments”. I think you could still argue that the Stan language does not (and should not) have pass-by-reference semantics (although it is implemented with references), but that’s probably not very helpful to users :-)

Bob_Carpenter · April 20, 2018, 4:28pm

It’s call by constant reference.

Right. Stan only lets you modify local variables and variables in blocks where they’re defined. That’s the “constant” part of the call.

The key thing is that derivatives, etc., are passed through function calls. That’s the pass-by-reference sense.

I don’t make the implications for derivatives clear anywhere in the language reference. I should make this much more clear in the next overhaul of the language reference. We’ll be working on a clearer Stan formal spec in a couple weeks.

Topic		Replies	Views
Multi_normal_cholesky is slower than hand coded lpdf Developers	6	737	April 11, 2022
Seeking expert stan modeler for help speeding up a complex stan model Jobs fitting-issues , specification , performance	4	875	July 29, 2020
Slow Stan Code Modeling fitting-issues , performance	33	2871	June 27, 2018
Cholesky decomposition/speed/divergencies Modeling	11	2651	August 19, 2017
Slow run time when fitting multi_normal_cholesky likelihood in bivariate model Modeling rstan , fitting-issues , specification , performance	2	162	March 11, 2024

Slow Cholesky factor in Stan?

Related Topics