Hi!

Since I am using a lot the `log_sum_exp`

function I thought it would not hurt to give it analytic gradients using the `operands_and_partials`

facility. This cuts down quite a bit of code in comparison to a bunch of vari definitions which are used currently. I think I did the obvious definition of the function with `operands_and_partials`

, see below. However, the code with the analytic gradients is ~15% slower than the original one. All tests defined for `log_sum_exp`

pass with the definition below.

This is counter-intuitive to me and if someone has an idea why things get slower even if analytic gradients are given, that would be helpful. With some luck this means that our operands_and_partials could be tweaked which would be a noticeable speedup. The only thing I see is that there is additional copying going when using the `operands_and_partials`

â€¦maybe c++11 moving could remove this? Not sure.

Best,

Sebastian

```
template <typename T1, typename T2>
inline typename return_type<T1, T2>::type log_sum_exp(const T1& a,
const T2& b) {
typedef typename stan::partials_return_type<T1, T2>::type T_partials_return;
using std::exp;
const T_partials_return a_dbl = value_of(a);
const T_partials_return b_dbl = value_of(b);
const T_partials_return fab = a_dbl > b_dbl ? a_dbl + log1p_exp(b_dbl - a_dbl)
: b_dbl + log1p_exp(a_dbl - b_dbl);
operands_and_partials<T1, T2> ops_partials(a, b);
if (!is_constant_struct<T1>::value)
ops_partials.edge1_.partials_[0] = exp(a_dbl - fab);
if (!is_constant_struct<T2>::value)
ops_partials.edge2_.partials_[0] = exp(b_dbl - fab);
return ops_partials.build(fab);
}
```