How are the gradients/derivatives for the multivariate distributions (prim/mat/prob) calculated in stan math?
I can see that the univariate distributions (prim/scal/prob) uses operands_and_partials, and the functions have gradients defined in fwd and rev, but I can’t see how it’s done for the mat distributions.
Is there any doc/wiki that I should be looking at?
If the corresponding functions aren’t in rev or fwd, probably means the functions are just being autodiffed themselves.
If they take advantage of big matrix operations and such (so that the bulk of the internal work has custom autodiffs), they should be pretty efficient.
Ah that makes sense, thanks!
multi_normal could be made much more efficient with fully analytic derivatives. Specifically the quadratic form/inverse in
(y - mu)' / Sigma * (y - mu) could be much more efficient. It is at least vectorized so that
Sigma is Cholesky factored only once.