LU decomposition / determinant of sparse asymmetric matrix

jeffeaton · June 4, 2019, 8:59am

Hello all,

I have a large but very sparse Jacobian matrix for a change of variables. I’m wondering if there is any experience with or advice for implementing an LU decomposition for a sparse matrix in Stan?

I’ve seen in discussion threads that LU decomposition is part of the discussion about specification for enhanced sparse matrix support in future, but I couldn’t find threads where it has been implemented.

Sparse LU decomposition is implemented in Eigen and there is also C code from the CSparse in the Matrix package in R, which I’ve contemplated trying to port to Stan code. But I figured I should ask for advice before I start trying to do something that I probably shouldn’t be doing…

Thanks,
Jeff

bgoodri · June 4, 2019, 3:32pm

No, you would have to write your own Stan function that calls a C++ function that calls Eigen.

Bob_Carpenter · June 4, 2019, 6:16pm

It’s almost certainly something we’d want to build into Stan, so integration would not be wasted effort. Though it’d be easier to just integrate a simple version for your problem than a full version for Stan.

The obstacle for us now is that we don’t have a sparse data structure. What we’ve done for sparse matrix-vector multiply is take in the components of CSR form as arguments. We’d probably do that first for QR decomposition, too, and we’d also very much like a sparse Cholesky and log determinant.

yizhang · June 6, 2019, 4:59pm

I remember there’s an issue in Eigen with null entries’ derivatives, has it been resolved?

Bob_Carpenter · June 7, 2019, 7:40pm

You mean the implicit zero entries in a sparse matrix? What’s the problem? We should be able to treat them like constants so we don’t need derivatives.

yizhang · June 7, 2019, 8:11pm

That’s assuming sparsity pattern remains the same, but in A=L^TL, L's the sparsity may change even if A‘s sparsity remains the same and only change its entries’ values.

jeffeaton · June 9, 2019, 7:55pm

Thanks Ben and Bob for the guidance.

Following Ben’s advice to link to Eigen, I searched a bit more and found this very helpful tutorial by @bbbales2 and colleagues. In case useful to anyone else in future, below is the C++ function I came up with and an small example Stan model. I need to do more benchmarking of how the sparse vs. dense LU decomposition scales for large matrices – I’ll update the post with that when I have a chance.

One question I had was best practice for error handling. On a couple of testing examples my R session was occasionally crashing during calling sampling() (after successfully compiling model). It was hard to trace why, but I suspected it could be a uncaught failure of the LU decomposition. I added the following to the function, but not sure if this is the best practice.

  if (lu.info() != Eigen::Success)
    throw std::runtime_error("Sparse LU decomposition error");

Thanks,
Jeff

C++ function

The C++ function below accepts a triplet list encoding a square sparse matrix as arguments i, j, and x, with argument dim specifying the number of rows and columns of the matrix. The function constructs an Eigen::SparseMatrix object, then constructs the LU decomposition and return the log absolute determinant.

cppfun <- '
template <typename T3__>
typename boost::math::tools::promote_args<T3__>::type
sparse_log_determinant(const int& dim,
                       const std::vector<int>& i,
                       const std::vector<int>& j,
                       const Eigen::Matrix<T3__, Eigen::Dynamic,1>& x,
                       std::ostream* pstream__) {

  typedef Eigen::Triplet<T3__> T;
  std::vector<T> tripletList;
  tripletList.reserve(i.size());
  for(int idx = 0; idx < i.size(); idx++)
    tripletList.push_back(T(i[idx]-1, j[idx]-1, x[idx]));

  Eigen::SparseMatrix<T3__> Js(dim, dim);
  Js.setFromTriplets(tripletList.begin(), tripletList.end());

  const Eigen::SparseLU<Eigen::SparseMatrix<T3__>> lu(Js);
  if (lu.info() != Eigen::Success)
    throw std::runtime_error("Sparse LU decomposition error");
  return lu.logAbsDeterminant();
}
'

Calling from Stan

Save the C++ code to a file.

cppfile <- tempfile()
writeLines(cppfun, cppfile)

In the Stan model code, declare a user-defined Stan function with the function declaration only but not the function body.

stancode <- "
functions {
  real sparse_log_determinant(int dim, int[] i, int[] j, vector x);
}
"

Compile the Stan model with the option allow_undefined = TRUE to allow declaration of the function without the body and providing the C++ file code using the includes = argument.

sm <- stan_model(model_code = stancode,
                 allow_undefined = TRUE,
                 includes = paste0('\n#include "', cppfile, '"\n'))

A couple of helpful notes:

Bales et al. and the rstan vignette describe how to ascertain the templated C++ signature expected by Stan.
The #include statement in stan_model() requires the full path to the C++ file.

Testing the code: spherical coordinates transformation

To test that the code is doing the right thing, the following example transforms spherical to Cartesian coordinates and samples each coordinate from an independent normal distribution:

x \sim \mathrm{Normal}(0, 1) \\ y \sim \mathrm{Normal}(-1, 1.5) \\ z \sim \mathrm{Normal}(3, 0.5)

Transformed paramters x, y, and z are a functions of spherical parameters r > 0, \theta \in [0, \pi], and \phi \in [0, 2\pi):

x = r\sin\theta\cos\phi \\ y = r\sin\theta\sin\phi \\ z = r\cos\theta

The Jacobian of the transformation is:

J = \left[ \begin{matrix} \sin\theta\cos\phi & r\cos\theta\cos\phi & -r\sin\theta\sin\phi \\ \sin\theta\sin\phi & r\cos\theta\sin\phi & r\sin\theta\cos\phi \\ \cos\theta & -r\sin\theta & 0 \\ \end{matrix} \right]

For comparison, analytically the log absolute determinant of J is \log ||J|| = 2 \cdot \log(r) + \log(\sin\theta).

Stan model

The Stan code below implements this model.

stancode <- "
functions {
  real sparse_log_determinant(int dim, int[] i, int[] j, vector x);
}
transformed data {
  // indices for non-zero elements of Jacobian in triplet form
  int i[8] = {1, 2, 3, 1, 2, 3, 1, 2};
  int j[8] = {1, 1, 1, 2, 2, 2, 3, 3};
}
parameters {
  real<lower=0> r;
  real<lower=0, upper=pi()> theta;
  real<lower=0, upper=2*pi()> phi;
}
transformed parameters {
  // spherical -> cartesian coordinates
  real x = r * sin(theta) * cos(phi);
  real y = r * sin(theta) * sin(phi);
  real z = r * cos(theta);
  //
  // non-zero Jacobian values in triplet form
  vector[8] Js;
  Js[1] = sin(theta) * cos(phi);
  Js[2] = sin(theta) * sin(phi);
  Js[3] = cos(theta);
  Js[4] = r * cos(theta) * cos(phi);
  Js[5] = r * cos(theta) * sin(phi);
  Js[6] = -r * sin(theta);
  Js[7] = -r * sin(theta) * sin(phi);
  Js[8] = r * sin(theta) * cos(phi);
}
model {
  target += sparse_log_determinant(3, i, j, Js);
  x ~ normal(0, 1);
  y ~ normal(-1, 1.5);
  z ~ normal(3, 0.5);
}
generated quantities {
  real logdet_sparse = sparse_log_determinant(3, i, j, Js);
  real logdet_analytical = 2 * log(r) + log(sin(theta));
}
"

Compile and sample from the model:

sm <- stan_model(model_code = stancode,
                 allow_undefined = TRUE,
                 includes = paste0('\n#include "', cppfile, '"\n'))

fit <- sampling(sm, seed = 17)

Confirm that x, y, and z have the desired mean and standard deviation.

print(fit, c("x", "y", "z"));

## Inference for Stan model: 420e358b1dd4b1c9a78473067fd0b1da.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##    mean se_mean   sd  2.5%   25%   50%  75% 97.5% n_eff Rhat
## x -0.01    0.02 0.99 -1.94 -0.71  0.00 0.67  1.89  2539    1
## y -1.02    0.06 1.54 -3.96 -2.09 -1.02 0.03  2.07   601    1
## z  3.00    0.01 0.51  2.00  2.66  3.01 3.34  4.00  2844    1
## 
## Samples were drawn using NUTS(diag_e) at Sun Jun  9 19:37:01 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Confirm that the log determinant calculated by Eigen::SparseLU matches the analytical log determinant.

print(fit, c("logdet_sparse", "logdet_analytical"));

## Inference for Stan model: 420e358b1dd4b1c9a78473067fd0b1da.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##                   mean se_mean  sd  2.5%  25%  50%  75% 97.5% n_eff Rhat
## logdet_sparse     1.71    0.03 0.8 -0.02 1.22 1.77 2.26  3.05   676    1
## logdet_analytical 1.71    0.03 0.8 -0.02 1.22 1.77 2.26  3.05   676    1
## 
## Samples were drawn using NUTS(diag_e) at Sun Jun  9 19:37:01 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

`sessionInfo()`

sessionInfo()

## R version 3.5.2 (2018-12-20)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.4
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] rstan_2.18.2       StanHeaders_2.18.1 ggplot2_3.1.0.9000
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.1         pillar_1.4.0       compiler_3.5.2    
##  [4] plyr_1.8.4         highr_0.7          prettyunits_1.0.2 
##  [7] tools_3.5.2        digest_0.6.19      pkgbuild_1.0.3    
## [10] evaluate_0.13      tibble_2.1.1       gtable_0.3.0      
## [13] pkgconfig_2.0.2    rlang_0.3.4        cli_1.1.0         
## [16] parallel_3.5.2     yaml_2.2.0         xfun_0.5          
## [19] loo_2.1.0          gridExtra_2.3      withr_2.1.2       
## [22] dplyr_0.8.1        stringr_1.4.0      knitr_1.22        
## [25] stats4_3.5.2       grid_3.5.2         tidyselect_0.2.5  
## [28] glue_1.3.1         inline_0.3.15      R6_2.4.0          
## [31] processx_3.3.1     rmarkdown_1.11     purrr_0.3.2       
## [34] callr_3.2.0        magrittr_1.5       codetools_0.2-16  
## [37] matrixStats_0.54.0 ps_1.3.0           scales_1.0.0      
## [40] htmltools_0.3.6    assertthat_0.2.1   colorspace_1.4-1  
## [43] stringi_1.4.3      lazyeval_0.2.2     munsell_0.5.0     
## [46] crayon_1.3.4

bgoodri · June 9, 2019, 8:49pm

This is fine, although it is probably easier to do things like this with a map to a CSR matrix as in
https://github.com/stan-dev/rstanarm/blob/master/inst/include/csr_matrix_times_vector2.hpp#L18
Then you can use the Matrix package and the rstan::extract_sparse_parts function.

Bob_Carpenter · June 12, 2019, 6:03pm

Yes, that’s what we want to do. It’ll cause the current value of parameters to be rejected, which has different implications depending on the algorithm.

I’m no longer sure what type of error we throw and catch other than std::domain_error. Some errors are fatal, like indexing errors—they stop whatever algorithm called them.

@syclik will know.

It will technically only need a relative path from where it’s compiled. Relative paths are more portable, but the full path would let you run from anywhere.

bgoodri · June 12, 2019, 6:43pm

You need a full path with RStan because stanc puts the generated C++ code in the temporary directory and the extra header file is never relative to that directory.

syclik · June 14, 2019, 5:01am

Fortunately, we’ve been documenting this stuff more thoroughly. See: https://mc-stan.org/math/d4/d84/namespacestan_1_1math.html#a65efbf7a8db1a8a9086f4d621032d9aa

Most of our validation functions will throw std::domain_error. (check_*()). Some will throw std::invalid_argument errors.

The way the MCMC samplers are configured, if there is an exception thrown and it is std::domain_error, it will treat that iteration as a rejection but as a recoverable error: set that iteration’s log probability value to 0 and move to the next iteration. If there’s an std::invalid_argument thrown, MCMC terminates. We treat these as non-recoverable, user programming errors. For example:

parameters {
  real theta[1];
}
model {
  theta[1] ~ normal(0, 1);
  theta[2] ~ normal(0, 1);
}

When we evaluate this log probability function, we’re indexing past the end of an array. We try to stop early when we know it’s an error that’s generated by the user and we really shouldn’t ignore it.

jeffeaton · June 14, 2019, 12:17pm

Great, thanks Daniel, Bob, Ben for the guidance.

It might be helpful to add an explanation on the different type of errors in the ‘Interfacing with External C++ Code’ tutorial. There was an example in there of throw std::domain_error, but the additional comment above about the types of errors and what they mean is helpful.

Bob_Carpenter · June 18, 2019, 5:13pm

Good idea. I created an RStan issue.

Topic		Replies	Views
Implicit, full-rank Q (Householder QR) General performance , algorithms , matrix	1	664	May 4, 2020
Can stan compute Jacobian as a Eigen::SparseMatrix? Modeling	3	804	January 7, 2019
Sparse matrices in Stan Developers	12	2548	February 8, 2018
Stan-math: Eigen support General stan-math	13	1819	July 4, 2017
A proposal for sparse matrices and GPs in Stan Developers	35	4820	January 18, 2018