Order of the pade approximation with matrix exponential

I thought I’d share this on discourse before submitting an issue on Github.

The matrix exponential function in Stan is based on Eigen’s implementation and uses a Pade approximation (see Molver and Van Loan, 2003 for a good description of the algorithm). A while ago, I tried monitoring the order of the approximation and found that, unless we’re working on simple matrices, the order of the approximation goes to 13 – which leads to rather intensive matrix operations. Another package, expokit, uses an approximation of order 6 (Sidje, 1998). This might be something we want to investigate. Do we need such a high order, what are the pros and cons, should we give the user control, etc.

If there is interest, I’ll post an issue on github.

Would it be possible to use a lower order approximation for the leapfrog steps and a higher order for the final accept/reject choice? My understanding is that this sort of thing is possible in HMC, though others here certainly know more about this than I do.

My experience with the matrix exponential is it can be difficult to find one implementation that works for all problem types. The Moler Van Loan paper you cited has a good discussion about this. The R expm package gives users control over which method and order which seems to work pretty well.

low accuracy could mess with the hamiltonian trajectories, so it’s probably not a great idea. This will be a challenge for very non-normal matrices. I’d say a tunable parameter in the call would be a good idea with high-order by default.

(If the matrix is SPD, the problem is much easier and you can get a very cheap, accurate approximation using Trefethen’s Caratheodery-Fejer method. But if it’s a general matrix, a combination of Pade and Scaling and squaring is probably unavoidable. Nick Higham has some more recent papers on this than the van loan one. He also did some stuff on the “condition number” which is really computing the Frechet derivative, which should be useful for autodiff)

That would be very hard with Stan’s current architecture, which doesn’t separate these two calculations.

There is no final accept-reject step in NUTS, only in static HMC. The original NUTS used slice samplig with a bias toward the last doubling, whereas the current version in Stan uses a multinomial version introduced by @betanalpha.

We can have more than one.

1 Like