Mathematical computation behind the scene


Hi all,

I am curious about the algorithm or formula behind the scene. For example, the manual says that the operator / (matrix division) is more stable than the inverse. A curious question is posed: what is behind the scene, how Stan computes inverse using / or what method that Stan uses to compute the inverse matrix. This is just a question. I have many like this when seeing a new operator, such as matrix_exp.

So my main question here is that: Could anyone please let me know the document that Stan lists all mathematical formulas or the algorithm behind the scene that the manual does not show.

Thank you so much for any help!
Trung Dung.


Everything’s in the code, of course, which is all open source. Most of the code documents the algorithms being used if they’re complicated. Otherwise, the code’s self-documenting as to algorithm. For example, matrix division is standard and we do the standard thing with an LU solver. Here’s the code for mdivide_left (matrix \), which is a bit simpler because it involves fewer transpositions than mdivide_right (matrix /).

template <typename T1, typename T2, int R1, int C1, int R2, int C2>
inline Eigen::Matrix<typename boost::math::tools::promote_args<T1, T2>::type,
                     R1, C2>
mdivide_left(const Eigen::Matrix<T1, R1, C1> &A,
             const Eigen::Matrix<T2, R2, C2> &b) {
  check_square("mdivide_left", "A", A);
  check_multiplicable("mdivide_left", "A", A, "b", b);
  return promote_common<Eigen::Matrix<T1, R1, C1>, Eigen::Matrix<T2, R1, C1> >(
          promote_common<Eigen::Matrix<T1, R2, C2>, Eigen::Matrix<T2, R2, C2> >(


I know the lin alg is pretty standard but it could be nice to have readable (eg pseudo code as in a paper) descriptions for algorithms like the NUTS variant that Stan implements. I’ve gone looking in the C++ and got a bit lost.


@Bob_Carpenter: Thank you Bob!

But I mean how can I see that for all functions that I want to understand? I do not want to take your time to ask each function separately.


The adaptation is described in the manual. The variant of NUTS currently in Stan is like the original NUTS except that rather than using slice sampling, it draws the next state using a multinomial with the same last-doubling bias as in the original NUTS.

I’d love to see nice pseudocode for NUTS but I’ve never been able to write one down. I have tried. The changes in direction and doubling just inevitably leads to a lot of index fiddling.



There was a thread a while back that went into accessible intro material for NUTS: Accessible explanation to the No-U-Turn Sampler It’s worth looking through for sure.

I found the pseudocode in the original paper (Hoffman and Gelman 2014) to be sufficiently clear to recreate the algorithm from scratch. There’s lots of places to make mistakes, but the way Matt wrote it is good. Particularly see algorithm 2 which is the “naive” version (meaning inefficient memory-wise), but gives a great intuition.

I have an R code version that implements algorithm 3 if you want to see it. I also get lost in the complex C++ used by Stan. Obviously you wouldn’t use it for inference (use Stan!) but it may help you understand the algorithm better.


Hi @Bob_Carpenter,

I think maybe my question is not clearly stated.

Suppose today I want to know about function A, tomorrow I want to know function B, and one day after I want to know function C (C++ algorithm or mathematical formula). Of course it is not a good idea that I create a thread every time I want to know about a new function (new to me).

I would like to know the algorithm behind a function in a way that I can do it myself every time I would do, instead of asking so many questions here.


I think there’s confusion about what gets computed.

The language is used to define a statistical model. You can consider that a function that takes in data and parameters and returns a number. It’s a derterministic function e.g. if you provide the same values for data and parameters, you will calculate the same return value. This return value is the log probability on the unconstrained scale: the target plus the jacobian adjustment for the change of variables. To compute this, you have all the functions available to you within the Stan language.

The algorithms use this function to estimate the parameters. (There’s a bit of nuance here where the Jacobian term might not be included.)

See the previous posts to find out information about some of the algorithms. See the Stan manual for documentation about functions in the library. See the source code in the math library to see how the functions are implemented. And please feel free to create new threads on discourse with specific questions!


Thank you, @syclik,

I am interested in mathematical function, such as, matrix_exp. In the manual, version 2.17.0, page 482, there is a formal definition of that function. I have googled and see that there are many ways to compute the exponential of a matrix. This article shows 19 ways. So for this function, for example, how I can see which way Stan uses to compute the exponential of a matrix.

This is what I would like to know. How I can see the source code (C++ for example) and the document that explain the source code.


The Math library is on GitHub:

Here’s what I get when I search for “matrix_exp” within the math repo on GitHub:

If you follow some of the code, you’ll end up getting to the Pade approximation: matrix_exp_pade.hpp



Your support is greatly appreciated!

This is exactly what I mean in my question. From now, I can read from that link the math library that Stan uses and try to understand before asking a question.

Again, thank you so much!

Trung Dung.