How can I access the vector of precomputed gradients of a `var`?

Sorry, this is a question that I know that I should be able to answer from the headers, but I can’t. I have one function foo1 that returns a scalar var with precomputed vector gradients. I have a second function foo2 that calls foo1, but I can’t interpret the header files well enough to understand how to recover those precomputed gradients.

More specifically, I have one function foo2 with signature

var foo2(const Eigen::Matrix<var, Eigen::Dynamic,1>& alpha_, 
           const Eigen::Matrix<var, Eigen::Dynamic,1>& beta_, 
           const double& x_, 
           std::ostream* pstream)

and inside foo2 I call foo1, e.g.

var k;
k = foo1(alpha_, beta_, t, q, pstream);
// foo1 has precomputed gradients
// how do I recover gradients that are in k?

I want to precompute the gradients dfoo2/dalpha and dfoo2/dbeta, but to do so, I need the precomputed gradients dk/dalpha and dk/dbeta;

I have already precomputed the gradients in foo1 like so:

var foo1(const Eigen::Matrix<var, Eigen::Dynamic,1>& alpha_, 
           const Eigen::Matrix<var, Eigen::Dynamic,1>& beta_, 
           const double& t_, 
           const int q, 
           std::ostream* pstream)
  // Calculate double k
  // Calculate vector dk_dalpha with size N
  // Calculate vector dk_dbeta with size N
  size_t NN = 2*N;
    vari** varis = ChainableStack::memalloc_.alloc_array<vari*>(NN);
    double* partials = ChainableStack::memalloc_.alloc_array<double>(NN);
    for (int i=0; i<N; i++) {
      varis[i] = alpha_(i).vi_;
      partials[i] = dk_dalpha(i);
    }
    for (int i=N; i<NN; i++){
      varis[i] = beta_(i-N).vi_;
      partials[i] = dk_dbeta(i-N);
    }
    return( var(new precomputed_gradients_vari(k, NN, varis, partials)));

I have verified that foo1 works and can be sampled from correctly using NUTS.

What I can’t figure out is how to access the gradients that foo1 precomputed and should be stored inside of k. Specifically, I can’t tell from the headers how to access the vector valued partial of a var. Any help would be appreciated.
Thanks.

The math library should do the chain rule piece of this for you.

So for foo1(alpha, beta) you provided the custom gradients:

dfoo1/dalpha and dfoo1/dbeta

For foo2(val), you should provide:

dfoo2/dval

If the input to foo2 is foo1, then so be it. If you ask the autodiff library to give you dfoo2/dalpha, then it’ll do the chain rule bit for you:

dfoo2/dalpha = dfoo2/dval * dfoo1/dalpha

So you shouldn’t need dfoo1/dalpha etc. in foo2

Thanks. Are you suggesting that I just ignore precomputed gradients and let autodiff do everything? Perhaps I should have said more. foo2 is actually an implicit function of foo1, solved using a root finding iteration. foo1 is actually called repeatedly, but I just need the derivatives at the last step. I’ve been told never to autodiff an implicit function. Should I let autodiff go through the function foo2 and just not worry about it? I know the gradients of everything here, and so, since I know how to do the gradients, shouldn’t I just precompute them rather than autodiff?

ignore precomputed gradients and let autodiff do everything

precompute em’ if you got em’

The precomputed gradients will be faster, but as with any manual gradients there’s a good chance of accidentally introducing bugs, etc, so you gotta be careful.

I’ve been told never to autodiff an implicit function

I assume this is to avoid autodiffing whatever iterative solver is used to fine the solution to the implicit function. It’s probably reasonable advice, but as a practical note, there’s quite a few complex things getting pushed through autodiff around here and they mostly work fine (eigenvalue solver is one).

But back to the question at hand!

You can included nested autodiffs inside your big autodiff. So by doing this you could recover the gradients of foo1 that you seek. It’s done in the Jacobian calculations in the ODE solver, for instance (https://github.com/stan-dev/math/blob/develop/stan/math/rev/mat/functor/jacobian.hpp). Grep around for nested stuff. It might look a little complicated. You gotta be careful with this though.

How many outputs does foo1 have? You might be better off using the forward mode autodiff there (if there’s way more outputs than inputs).

foo2 returns a double/var. I need to calculate dfoo2 w.r.t. alpha, beta and x. I know how to do these, and I was planning on using precomputed_gradients_vari in foo2 just as I had done in foo1.
But I was hoping to have access to those precomputed gradients in the return from foo1. I don’t understand the data structure of the var returned by my foo1. I was hoping if k=foo1(alpha, beta,...) then I might be able to extract the i-th partial derivative with something like k(i).adj() or (k.adj())[i].

Barring access to the gradients within the var I could very easily write another instance of foo1 (say foo1_with_grad) that returns a single VectorXd containing both the value and the partials, or I could look into the ODE solver as you suggest.

Ah, yeah, I see what you’re saying. Anything other than just having the gradients seems kind of computationally inefficient.

You could probably do some pointer chasing and manual casting to extract the precomputed gradients from the vari where they’d live, but it’d be super awkward and hacky and prone to breaking. What you said sounds better. Just make a foo1_with_gradients that returns both things.

Then your foo1 function could just be a little placeholder that wraps everything up nicely with the precomputed gradients, and foo2 can do what it needs with the rest.

Thanks. This is helpful. So, to wrap this up (pun intended), it seems that if you have an algorithm foo that precomputes gradients, and there are use cases when you need the gradients, and use cases when you need it chainable, then the best design is to put the algorithm in a VectorXd foo_with_grad(), and then provide a var foo() wrapper (and probably a double foo() as well.)
Alternatively, one could send a pull request to add a .gradients() member function of class precomputed_gradients_vari that provides access to the private data member gradients_. But so far the number of uses of this are n=1.

Cool beans! Good luck with your implementin’

Stan Math headed to C++11. You might find something there that you can take advantage of (tuples and such). I dunno. Personal preference.

Alternatively, one could send a pull request to add a .gradients() member function of class

Yeah, it’s tempting, but I think what you’re doing now is the right thing.

One issue would be that only precomputed varis could do this. precomputed gradients is like putting little bits of forward mode autodiff in the reverse mode graph, if that makes sense. Things that do their autodiff in reverse mode wouldn’t play well with this.

Secondly the autodiff is all built around scalar things. So if the output of foo1 is a vector, it’s not that the autodiff tracks the autodiff information of that vector – it tracks the autodiff info for each individual value in the vector (and the vector itself is just a normal thing).

For what it’s worth I think this is how you could embed a small reverse mode autodiff in foo2 to get the gradients of out wrt alpha/beta:

start_nested()
var alpha = 1.0, beta = 1.0;
auto out = foo1(alpha, beta);
out.grad();
std::cout << alpha.adj() << " " << beta.adj() << std::endl;
recover_memory_nested()

The issue with this is that if out is a vector you’d have to repeat it for each output. Which just seems awkward.

I’d verify that out.grad() only uses walks back to the last nesting on the stack.