Hi all,
That is my first post there and I really like Stan :-)
Just a few remarks on AutoDiff min/max implementations. It is a topic I already have ruminated some time ago, my blog post.
Autodiff min/max implementations
IMHO the fvar code for min/max:
template <typename T> inline fvar<T> fmax(const fvar<T>& x1, const fvar<T>& x2) { using ::fmax; if (unlikely(is_nan(x1.val_))) { if (is_nan(x2.val_)) return fvar<T>(fmax(x1.val_, x2.val_), NOT_A_NUMBER); else return fvar<T>(x2.val_, x2.d_); } else if (unlikely(is_nan(x2.val_))) { return fvar<T>(x1.val_, x1.d_); } else if (x1.val_ > x2.val_) { return fvar<T>(x1.val_, x1.d_); } else if (x1.val_ == x2.val_) { return fvar<T>(x1.val_, NOT_A_NUMBER); } else { return fvar<T>(x2.val_, x2.d_); } }
can be greatly simplified:
template <typename T> inline fvar<T> fmax(const fvar<T>& x1, const fvar<T>& x2) { using ::fmax; if(x2.val_ == fmax(x1.val_, x2.val_)) { return fvar<T>(x2.val_, x2.d_); } return fvar<T>(x1.val_, x1.d_);
}
I have added some tests to explicitly check the behavior toward NaN values:
TEST(AgradFwdFmax, fmax_NaN_Number) {
using stan::math::fvar;
fvar x(stan::math::NOT_A_NUMBER, stan::math::NOT_A_NUMBER);
fvar y(3.0, 4.0);
fvar a = fmax(x, y);
EXPECT_FLOAT_EQ(3.0, a.val_);
EXPECT_FLOAT_EQ(4.0, a.d_);
}
TEST(AgradFwdFmax, fmax_Number_NaN) {
using stan::math::fvar;
fvar x(1.0, 2.0);
fvar y(stan::math::NOT_A_NUMBER, stan::math::NOT_A_NUMBER);
fvar a = fmax(x, y);
EXPECT_FLOAT_EQ(1.0, a.val_);
EXPECT_FLOAT_EQ(2.0, a.d_);
}
TEST(AgradFwdFmax, fmax_NaN_NaN) {
using stan::math::fvar;
fvar x(stan::math::NOT_A_NUMBER, stan::math::NOT_A_NUMBER);
fvar y(stan::math::NOT_A_NUMBER, stan::math::NOT_A_NUMBER);
fvar a = fmax(x, y);
EXPECT_TRUE(stan::math::is_nan(a.val_));
EXPECT_TRUE(stan::math::is_nan(a.d_));
}
Also note that the implementation:
if(x2.val_ == fmax(x1.val_, x2.val_))
insures the arbitrary, but documented, behavior
min(a,b) returns b if a.val=b.val
Another thing concerning fvar constructors:
If one want to use fvar in a nested way (for forward mode n-order diff), IMHO the constructors have to be fixed:
template <typename TV, typename TD> fvar(const TV& val, const TD& deriv) : val_(val), d_(deriv) { if (unlikely(is_nan(val))) d_ = val; }
must be modified as follow:
template <typename TV, typename TD> fvar(const TV& val, const TD& deriv) : val_(val), d_(unlikely(is_nan(val)) ? val : deriv) {}
This is not equivalent, in one case the op= is used, in the second one, constructors are truly called recursively (propagating NaN values for instance).
Reverse mode
I have not touched this part as I am not enough famialiar with the code, however I think the same simplifications hold.
Avalaibility
I have created a branch fminfmax_fwd_code_cleaning in my GitHub repo, runTests.py is OK
Bibliography
One last thing, I have seen this article:
A new framework for the computation of Hessians
I did not have the time to examine the details yet but the method looks very interesting for 2nd order reverse mode computations. I have seen a toy implementation there (not tested): https://github.com/BachiLi/had
Regards,
Vincent P.