Autodiff in Boost

math

#1

Someone has made a PR to merge their autodiff library into Boost Math

It appears to be forward-mode only and relying on C++17 constructs. But it is perhaps worth commenting on the implementation and / or considering whether Stan Math could, at some point, use that for forward mode and drop the partial implementation that we have had for a long time.


#2

Upon further review, the C++17 is an optional optimization and a C++11 compiler is adequate.


#3

Bumping in case anyone wants to comment on


which is ostensibly ready to be merged into Boost Math.


#4

Is there doc somewhere for what they do? Is it a templated forward-mode like ours? Vector based?

The problem for us isn’t the basics of forward mode, it’s adding all of our functions, which they won’t have implemented in Boost.


#5

Yes, http://www.unitytechgroup.com/doc/autodiff/

Yes, mostly

std::vector representing the coefficients of a polynomial

If it works with the rest of Boost Math and as a scalar type in Eigen, that is the vast majority of things we have currently.


#6

I should have asked what you think the benefit of moving to Boost would be.

There are two things partial about our current implementation:

  1. It doesn’t handle ODE solvers, etc.

  2. It hasn’t been fully tested with all the Eigen types.

Otherwise, everything that’s left to do to complete our autodiff would also have to be done to complete the Boost implementation for Stan.

Here’s everything in stan/math/fwd:

arr
arr.hpp
core
core.hpp
mat
mat.hpp
scal
scal.hpp

./arr:
fun

./arr/fun:
log_sum_exp.hpp
sum.hpp
to_fvar.hpp

./core:
fvar.hpp
operator_addition.hpp
operator_division.hpp
operator_equal.hpp
operator_greater_than.hpp
operator_greater_than_or_equal.hpp
operator_less_than.hpp
operator_less_than_or_equal.hpp
operator_logical_and.hpp
operator_logical_or.hpp
operator_multiplication.hpp
operator_not_equal.hpp
operator_subtraction.hpp
operator_unary_minus.hpp
operator_unary_not.hpp
operator_unary_plus.hpp
std_numeric_limits.hpp

./mat:
fun
functor
meta
vectorize

./mat/fun:
Eigen_NumTraits.hpp
columns_dot_product.hpp
columns_dot_self.hpp
crossprod.hpp
determinant.hpp
divide.hpp
dot_product.hpp
dot_self.hpp
inverse.hpp
log_determinant.hpp
log_softmax.hpp
log_sum_exp.hpp
mdivide_left.hpp
mdivide_left_ldlt.hpp
mdivide_left_tri_low.hpp
mdivide_right.hpp
mdivide_right_tri_low.hpp
multiply.hpp
multiply_lower_tri_self_transpose.hpp
qr_Q.hpp
qr_R.hpp
quad_form_sym.hpp
rows_dot_product.hpp
rows_dot_self.hpp
softmax.hpp
squared_distance.hpp
sum.hpp
tcrossprod.hpp
to_fvar.hpp
trace_gen_quad_form.hpp
trace_quad_form.hpp
typedefs.hpp
unit_vector_constrain.hpp

./mat/functor:
gradient.hpp
hessian.hpp
jacobian.hpp

./mat/meta:
operands_and_partials.hpp

./mat/vectorize:
apply_scalar_unary.hpp

./scal:
fun	meta

./scal/fun:
Phi.hpp
Phi_approx.hpp
abs.hpp
acos.hpp
acosh.hpp
asin.hpp
asinh.hpp
atan.hpp
atan2.hpp
atanh.hpp
bessel_first_kind.hpp
bessel_second_kind.hpp
binary_log_loss.hpp
binomial_coefficient_log.hpp
cbrt.hpp
ceil.hpp
cos.hpp
cosh.hpp
digamma.hpp
erf.hpp
erfc.hpp
exp.hpp
exp2.hpp
expm1.hpp
fabs.hpp
falling_factorial.hpp
fdim.hpp
floor.hpp
fma.hpp
fmax.hpp
fmin.hpp
fmod.hpp
gamma_p.hpp
gamma_q.hpp
grad_inc_beta.hpp
hypot.hpp
inc_beta.hpp
inv.hpp
inv_Phi.hpp
inv_cloglog.hpp
inv_logit.hpp
inv_sqrt.hpp
inv_square.hpp
is_inf.hpp
is_nan.hpp
lbeta.hpp
lgamma.hpp
lmgamma.hpp
log.hpp
log10.hpp
log1m.hpp
log1m_exp.hpp
log1m_inv_logit.hpp
log1p.hpp
log1p_exp.hpp
log2.hpp
log_diff_exp.hpp
log_falling_factorial.hpp
log_inv_logit.hpp
log_inv_logit_diff.hpp
log_mix.hpp
log_rising_factorial.hpp
log_sum_exp.hpp
logit.hpp
modified_bessel_first_kind.hpp
modified_bessel_second_kind.hpp
multiply_log.hpp
owens_t.hpp
pow.hpp
primitive_value.hpp
rising_factorial.hpp
round.hpp
sin.hpp
sinh.hpp
sqrt.hpp
square.hpp
tan.hpp
tanh.hpp
tgamma.hpp
to_fvar.hpp
trigamma.hpp
trunc.hpp
value_of.hpp
value_of_rec.hpp

./scal/meta:
ad_promotable.hpp
is_fvar.hpp
operands_and_partials.hpp
partials_type.hpp

#7

My impression was that we could rely on Boost’s autodiff to get a basic forward mode implementation working, take advantage of the fact that they are unit testing it, and then incrementally add functions and analytic derivatives as needed. For example, Boost Math doesn’t have log_sum_exp but we can call a templated log_sum_exp(autodiff_fvar& x, autodiff_fvar&y) and it should produce a valid autodiff_fvar as the output. We could specialize it to do the derivative analytically like we do with vars but that mostly just saves RAM.


#8

Our forward-mode autodiff is very well tested through the C++ math library level. And mostly through our non-matrix functions. So I’m still not sure what this gains us other than having it be someone else’s library rather than ours. We’re still going to want to test all of our functions.

It depends on the function and the order. With something like fvar<var>, you want to make sure to call the var version of log_sum_exp internally. Then you get the usual efficiency (time and space both) from implementing analytic gradients. For example,

\frac{\partial}{\partial x} \log(\exp(x) + \exp(y)) = \frac{\exp(x)}{\exp(x) + \exp(y)}.

Writing a custom gradient function for \frac{\exp(x)}{\exp(x) + \exp(y)} would save a lot of evals, especially if we share evals for \frac{\partial}{\partial x} and \frac{\partial}{\partial y}.

Every time we save memory, we also save time in propagating gradients.


#9

That is what I am saying could reduce our development burden. We shouldn’t need to test things like log(autodiff_fvar& x) unless we specialize its gradient. And this would allow us to have a complete-ish forward mode autodiff that algorithms use while we remove bottlenecks incrementally by specializing the gradients.