Adj_jac_apply

@anon75146577 here’s the addition operator you ordered! @Bob_Carpenter – you’ll be interested in the benchmarks at the end.

struct AddFunctor {
  template <std::size_t size>
  Eigen::VectorXd operator()(const std::array<bool, size>& needs_adj, const Eigen::VectorXd& x1, const Eigen::VectorXd& x2) {
    check_size_match("AddFunctor::operator()", "x1", x1.size(), "x2", x2.size());
    return x1 + x2;
  }

  template <std::size_t size>
  auto multiply_adjoint_jacobian(const std::array<bool, size>& needs_adj,
                                     const Eigen::VectorXd& adj) {
    return std::make_tuple(adj, adj);
  }
};

This is compilable from the current stan-dev/math develop branch.

This is equivalent to the prim implementation:

auto AddFunctorAutodiffed = [](auto& x1, auto& x2) {
  check_size_match("AddFunctorAutodiffed::operator()", "x1", x1.size(), "x2", x2.size());
  return (x1 + x2).eval();
};

And in the spirit of Checking That Things We Write Actually Work, I ran some benchmarks. I compared the prim implementation above to the adj_jac_apply implementation. I also coded up an “inefficient” adj_jac_apply that computes a full Jacobian as a comparison.

I expected the autodiff and efficient adj_jac_apply would both be fast, and the adj_jac_apply faster for large vectors (cause there’s way less chain calls). Turns out adj_jac_apply is about 20% slower than the purely prim implementation. I guess that means my processor is better at virtual function calls than I gave it credit for, or shufflying around the double-datatypes in adj_jac_apply is more expensive than I thought. These are the numbers:

autodiff_vs_efficient

The inefficient implementation is, of course, bad. And I guess this should just be a Warning that unless this stuff is used craftily, you can still end up slowing your code down:

comparison

I’m going to compare a prim vs. adj_jac_apply implementation of a more complicated function to get a handle on this (simplex_constrain, but I’ll get to that later). Looks like we’re gonna need to be careful when using this to make sure the complexity of our adj_jac_apply doesn’t exceed the regular autodiff! It’s sneakily efficient it seems.

Full test benchmark code is here: https://gist.github.com/bbbales2/a1689764f0fda6df561e858026f4e8d9

1 Like