Gradient/etc... with parameter packs


#1

Somewhat fictional given my time constraints at the moment but I was looking at wrapping single-function performance testing the way Bob wrapped single-function AD testing and there’s a bunch of code like this:

  Eigen::VectorXd grad_ad;
  double fx_ad;
  gradient<F>(f, x, fx_ad, grad_ad);
  expect_near("test_gradient fx = fx_ad", fx, fx_ad);

Using parameter packs and tuples in an implementation of gradient this could instead look like:

auto g = gradient(f, ...);

Where ... is whatever arguments f normally takes and the return value g is std::tuple<Eigen::VectorXd, double> or similar. f could also be a functor.

I think the main advantage would be avoiding the need to specify functors just to adapt stuff to the gradient/hessian functions.


#2

What’s the type of g and how is this new gradient function implemented? With general ... arguments, you get a component of the gradient for each argument and a component of the Hessian from each pair of scalar arguments, which might be spread over multiple arguments, one of which is a matrix and one a vector. Then what? What would it look like for a simple-ish function like this:

vector multiply(const matrix& A, const vector& b);

We need to test three cases: A var, b var; A data, b var; A var, b data. Then we get out gradiens in the form of a matrix and a vector or both depending on what is data.

With the functors, you have a single vector of arguments and thus a single vector of gradients, so all the testing all the way up can be generic.


#3

Doing conference stuff or a few days but then I’ll work out a simple example to move this conversation forward.


#4

@Bob_Carpenter: For the return value of the gradient function: it can be a std::tuple<Eigen::VectorXd, double> where the double is the return value of f and the Eigen::VectorXd holds the gradient. Right?

The detail I’m missing: the size of the gradient vector needs to be known at run-time or at compile-time?


#5

I don’t think it’s that easy. The return value can be anything. Consider matrix * vector again—it returns a vector. We have to test the whole matrix Jacobian at first level and the whole tensor of Hessians at second order. Then if both the matrix and vector have autodiff types, then where does the Jacobian of gradients come from—we’d naturally get a matrix and vector of derivatives for each output dimension.


#6

Gah, duh, yes of course. That I still see how to deal with, let’s put that off.

Let’s set aside the Hessian for the moment, I can do N+1 dimensions if I do N. Question: is it true that for a gradient calculation, we always flatten the resulting output to a vector?

Here’s what I see in the tests that makes me think this is the case:


TEST(AgradRevMatrix, multiply_matrix_matrix_grad_fd) {
  using Eigen::MatrixXd;
  using Eigen::VectorXd;

  int N = 3;
  int M = 4;
  int K = 5;
  MatrixXd A;
  MatrixXd B;
  MatrixXd AB(N, K);
  VectorXd test = generate_inp(N, M, K);
  pull_vals(N, M, K, test, A, B);
  AB = A * B;
  for (int n = 0; n < N; ++n) {
    for (int k = 0; k < K; ++k) {
      mult_vv<-1, -1, -1> func(n, k, N, M, K);
      VectorXd grad_ad(N * M + M * K);
      VectorXd grad_fd(N * M + M * K);
      double val_ad;
      double val_fd;
      stan::math::gradient(func, test, val_ad, grad_ad);
      stan::math::finite_diff_gradient(func, test, val_fd, grad_fd);
      EXPECT_FLOAT_EQ(AB(n, k), val_ad);
      EXPECT_FLOAT_EQ(AB(n, k), val_fd);
      for (int i = 0; i < grad_ad.size(); ++i)
        EXPECT_NEAR(grad_ad(i), grad_fd(i), 1e-10);
    }
  }
}

So although for the return type we have some complication for the gradient it’s a straightforward vector (?). I suppose if we make the return type arbitrary we could change the gradient to be arbitrary with N+1 dimensions but for the moment I want to stick to the current output format.


#7

No. If we have a matrix of var and vector of var, the gradients come out as members of the var in the matrix and vector types, not in an array.

We flatten at the very top in a model, where all the gradients are w.r.t. the unconstrained parameters. We do that by reading the structured parameters off the unconstrained parameter sequence and transforming. That’s all done with code generation.

In the functionals, we assume the functors being operated on are functions of a vector. So that func has to be a functor with method

var operator()(const Eigen::Matrix<T, -1, 1>& theta) const;

That’s why I was trying to convert to that case so that the back end of the testing could all be uniform.