Neat! Gonna make some notes while I skim this over. I think this is written by the same folks who did the complex number in Stan talk last Stancon
Therefore, replacing the return type of the functors with the auto
keyword allows the passing of expression types from the AD-O tool to the Eigen internals.
I think Eigen doesn’t require C++11 so doubt this would get backported without some macro magic. This section has some nice gotcha’s with Eigen and auto we have ran into in the past
template<typename _MatrixType , typename e n a b l e _ i f = void>
nooooo, not a good name!
Some of their solver stuff will be added to Eigen 3.4!
If they open source their actual implementation we could look at how they use the new eigen_adsolver_enable
type in Eigen.
Oddly, they talk a lot in the beginning about the Eigen solver stuff but then in the benchmarks don’t do comparisons for cholesky AD, which I feel like is sort of the base line solver.
Was confused by this sentence:
dco/c++provides a constant copy-overhead which is independent of the differentiation order; i.e. no matter what order, only a single copy operation to the passive data type is required.
Like the initial data is only passed over once? Or they somehow avoid making more than one copy inter-operation? For (1), yes that’s cool and good. For (2), idk how you go about that.
.
Regarding the third point, the solution vector can simply be saved in the callback object. Inorder to provide the decomposition in the adjoint run, it would be possible to save the inputmatrix in the augmented primal run and then decompose it again in the adjoint run. However,this would introduce a run time overhead of O(n^3) in the adjoint run. Instead, it is beneficial to save the whole decomposition.
If I’m reading this correctly we do the same thing in cholesky_decompose
’s vari
specializations, unless they went a step further and made the eigen matrices stateful.
Since the AD implementations quickly exceed the amount of available RAM, they are not suited for higher input dimensions and no data is available for n >1000.
Maybe I’m just dumb but I also don’t understand this sentence. idt we do what would be considered symbolic diff and we do pretty large matrices so I am confused why they can’t do n > 1000. heck, our OpenCL stuff doesn’t even kick in until you get to N,M > 1000
Ah, okay on page 11 they start talking about us
However, it must be noted that the Stan Math Library does not offer an API for advanced AD users, e.g. lacking a callback system or the possibility to compute arbitrary adjoint directional derivatives in the adjoint mode, only offering a grad()
functionality.
If their thing can do directional derivatives this is the first place they mention it in the paper. Can’t we do directional derivatives in forward mode?
Although the Stan Math Library can be applied to arbitrary code, we see its advantages rather on the applications it was especially designed for, e.g. linear algebra using its own API and storage types from Eigen. dco/c++ on the other hand is aimed to be used for more generically written code and therefore requires less modifications or restrictions to be applied, while still providing a good performance. This can be inferred from Figure 10, where std::vector
was used as the storage type for computing
Would be curious to see why we suck so much on that simple std::vector
example. If their AD type they use builds up it’s own expressions that would probably be enough to explain the difference.
Stan math is made for a very specialized purpose, but I don’t really see any of their reasons it’s not generic enough as terribly valid. That’s not to say there are not reasons, there are some very good reasons it’s not generic. Like idk having a stack allocator with a bushel of pointer chasing on the chain
methods and having to initialize the AD tape which is kind of annoying and never deleting stack allocated memory. But those are pretty known tradeoffs between performance and generic-ness
Stan provides a mdivide_left(A,b) function to solve a system of linear equations which will always use the ColPivHouseholderQR decomposition internally to solve the system. Therefore, the algorithmic and the dco/c++/eigen measurements displayed in Figure 13 utilize this solver class. Although Stan also evaluates the adjoints symbolically, the implementation of dco/c++/eigen presented in Section 3.2.1 is faster. The reason for that is that Stan performs another decomposition in the adjoint run, while the implementation of dco/c++/eigen keeps the decomposition from the augmented primal run in memory and reuses it as described in Section 3.3
Huh, could we just add the precomputed decomposition to the vari
specialization of mdivide_left
so we don’t compute it twice?
The current API implementation of Stan limits the user to prescribed underlying decompositions like ColPivHouseholderQR and FullPivHouseholdeQR as stated above.dco/c++/eigen on the other hand allows the utilization of any of the supporting Eigen decompositions.
Very valid, I’d be interested in seeing their integration! We could probs utilize a form of it with our var
type
tbh it kinda looks like they cherry picked a few things, but like I get it ya know that’s whatever. I like papers where we get called out, it’s like we are the big bad guy / final boss. Gives me another reason to wear sunglasses indoors.
It would also be nice to see precision comparisons between the calculations of the derivatives. They should be v v similar for simple things, though cholesky and other solvers could be wily.