Sparse linear system

Dear Stan community,

I thought about using Stan to solve an inverse problem. My forward model consists in solving a system of PDE’s, and I use the finite elements method to approximate the spatial derivatives. By doing so, at some point I have to solve a linear system of kind Ax=b, where A is a sparse matrix. Thinking about setting A as a full matrix in my Stan code is unfeasible, since I’d require around 200 gigs of RAM.

According to this section of the manual, the only supported operation is the multiplication of a sparse matrix and a dense vector. Are there any plans to include a solver of sparse linear systems?

Thanks in advance!

Best,
Rodrigo

Hi Rodrigo

According to my understanding, the only operation for sparse matrices using Stan is what you are mentioning.

If you want to work with sparse matrices for your inverse problem and then do Bayesian inference, you could try using \texttt{TMB}. There you can solve the linear system Ax = b , where A is sparse and then, putting priors on the parameters, you can use the HMC algorithm of Stan through the \texttt{tmbstan} library.

Another idea could be create your own functions to solve the linear system using RcppArmadillo (if you are working R).

Cheers,

Hi Cavieres,

Thank you very much for your reply. I wasn’t aware of this “tmbstan” package. I’m gonna read the documentation and, in case it is useful, I’ll let you know.

Regarding the RcppArmadillo, I don’t think it’s going to be helpful, since, as far as I understand, it wouldn’t be possible to use any function written in R in my Stan model. But please correct me if I’m wrong.

Best,
Rodrigo

Is A known and you just need to solve for x? Or does A get sampled as well?

If you know A and want to construct a random x which satisfies the constraint you can sample x from a constrained mvn. All the stuff in the transformed data block can be done ahead of time using sparse solvers. The input mu is optional. I think you could probably have this as a parameter vector.

This is constructing X \sim N(\mu, \Gamma) given AX = b

data {
  int<lower=0> n;
  int<lower=0> N;
  
  matrix[n, N] A;
  matrix[N, N] G;
  
  vector[n] b;
  
  vector[N] mu;
}
transformed data {
  // P = diag(N) - t(A) %*% solve(A %*% t(A)) %*% A;
  matrix[N, N] P = add_diag(-A' * (tcrossprod(A) \ A), rep_vector(1, N));
  tuple(matrix[N, N], vector[N]) eigen = eigendecompose_sym(P * G \ P);
  matrix[N, N] Sigma = P * eigen.1;
  vector[N] sigma = 1 ./ sqrt(eigen.2 + 1e-8);
  
  //mu_new <- mu + G %*% t(A) %*% solve(A %*% G %*% t(A)) %*% (b - A %*% mu)
  vector[N] mu_new = mu + G * A' * quad_form(G, A') \ (b - A * mu);
  
}
parameters {
  vector[N] x_raw;
}
transformed parameters {
  vector[N] x = mu_new + diag_post_multiply(Sigma, sigma) * x_raw; 
}
model {
  x_raw ~ std_normal();
}
generated quantities {
  vector[n] b_out = A * x;
}

Let’s say I generate the data as

N <- 8
n <- 3
mu <- c(1:N)
A <- matrix(rnorm(n * N), n, N)
b <- rnorm(n)
G <- rethinking::rlkjcorr(1, N)

ibrary(cmdstanr)
mod <- cmdstan_model("mvn_linear_constraint.stan")

mod_fit <- mod$sample(
  data = list(
    n = n,
    N = N,
    A = A,
    G = G,
    b = b,
    mu = mu
  ),
  parallel_chains = 4
)

mod_fit$summary("b_out")
b

I get that the constraint is satisfied with a random x vector

> mod_fit$summary("b_out")
# A tibble: 3 × 10
  variable   mean median    sd   mad     q5    q95  rhat ess_bulk
  <chr>     <num>  <num> <num> <num>  <num>  <num> <num>    <num>
1 b_out[1] -0.966 -0.966     0     0 -0.966 -0.966    NA       NA
2 b_out[2] -1.45  -1.45      0     0 -1.45  -1.45     NA       NA
3 b_out[3] -0.552 -0.552     0     0 -0.552 -0.552    NA       NA
# ℹ 1 more variable: ess_tail <num>
> b
[1] -0.9655925 -1.4478979 -0.5517504

where x is

> mod_fit$summary("x")
# A tibble: 8 × 10
  variable   mean median    sd   mad    q5    q95  rhat ess_bulk ess_tail
  <chr>     <num>  <num> <num> <num> <num>  <num> <num>    <num>    <num>
1 x[1]     -2.99  -2.99  0.567 0.572 -3.92 -2.05   1.00    6711.    2754.
2 x[2]     -6.08  -6.08  0.629 0.620 -7.11 -5.00   1.00    6416.    3129.
3 x[3]      3.07   3.07  0.897 0.899  1.59  4.56   1.00    6591.    3207.
4 x[4]     -0.619 -0.613 0.716 0.722 -1.78  0.579  1.00    6444.    3107.
5 x[5]      4.84   4.84  0.680 0.671  3.73  5.95   1.00    6681.    3112.
6 x[6]     10.4   10.4   0.926 0.909  8.84 11.9    1.00    6708.    2975.
7 x[7]     -4.44  -4.44  0.537 0.537 -5.31 -3.55   1.00    6267.    2979.
8 x[8]     -1.70  -1.71  0.726 0.712 -2.89 -0.503  1.00    6710.    2718.

If you want to estimate mu then this works. The only thing you couldn’t do before hand with A is b - A * mu_raw which you could do with the inbuilt support of a sparse matrix times a dense vector.

data {
  int<lower=0> n;
  int<lower=0> N;
  
  matrix[n, N] A;
  matrix[N, N] G;
  
  vector[n] b;
}
transformed data {
  // P = diag(N) - t(A) %*% solve(A %*% t(A)) %*% A;
  matrix[N, N] P = add_diag(-A' * (tcrossprod(A) \ A), rep_vector(1, N));
  tuple(matrix[N, N], vector[N]) eigen = eigendecompose_sym(P * G \ P);
  matrix[N, N] Sigma = P * eigen.1;
  vector[N] sigma = 1 ./ sqrt(eigen.2 + 1e-8);
  
  //mu_new <- mu + G %*% t(A) %*% solve(A %*% G %*% t(A)) %*% (b - A %*% mu)
 // vector[N] mu_new = mu + G * A' * quad_form(G, A') \ (b - A * mu);
  
}
parameters {
  vector[N] x_raw;
  vector[N] mu_raw;
}
transformed parameters {
  vector[N] mu_new =  mu_raw + G * A' * quad_form(G, A') \ (b - A * mu_raw);
  vector[N] x = mu_new + diag_post_multiply(Sigma, sigma) * x_raw; 
}
model {
  x_raw ~ std_normal();
  mu_raw ~ normal(0, 5);
}
generated quantities {
  vector[n] b_out = A * x;
}

Hi spinkney,

I really appreciate all the effort you put in explaining me this. In my case, A is sampled as well. If I consider A as a full matrix, I can summarize my code as follows:

functions {
  vector my_function(int n, real P) {
    matrix[n, n] A;
    // Do some math to evaluate A, which depends on P
    vector[n] b;
    // Do some math to evaluate b
    vector[n] x = A \ b;
    return(x);
}

data {
  int n;
  vector[n] my_data;
  real mu;
  real sigma;
}

parameters {
  real P;
}

transformed parameters {
  vector[n] function_of_P = my_function(n, P);
}

model {
  my_data ~ normal(function_of_P, sigma); // Likelihood.
  P ~ normal(mu, sigma); // Prior
}

As you can see, A is used to solve the forward problem, which is used to model the likelihood. The example above works fine when I can treat A as a dense matrix. But if n is very large, the only feasible way is to declare A as a sparse matrix using compressed row storage. I have already done this on MATLAB, where I can easily implement a sparse linear solver to find x. I was now wondering if I could do something similar in Stan.

Best,
Rodrigo

Hi!

Well, in my previous comment only I was try to give you an idea about how to solve the linear (sparse) system.

So, if you want to solve a sparse linear system, RcppArmadillo/RcppEigen could be good options to use. Or, you could use some iterative method to achieve a fast computation and avoid invert your matrix. However, here you should create your own MCMC algorithm (GS, MH, HMC, etc) for the Bayesian inference.

On the other hand, we have to be clear if these types of operations are allowed in Stan. For example, if a sparse matrix A is introduced as an input in a Stan model, can we do operations directly with that matrix? For example, a vector*matrix operation? I don’t think so, and if I’m wrong I would be happy to see some examples!.

Cheers,

Hi cavieresJJ,

Thanks again for your reply.

I understand your point. In my case, I think the best option is to write my own C++ function that solves the sparse linear system. I’ve been working on it, but so far I didn’t manage to make it work. But this of course it is beyond the topic of this post. Anyways, I really appreciate all the help!

Best,
Rodrigo