Closures discussion

nhuurre · May 7, 2020, 1:02pm

The closures design doc was accepted in December but there’s been no activity since then. I think it’s time to do this.

Tagging everyone who participated in the design-doc discussion:
@Bob_Carpenter, @seantalts, @wds15, @Matthijs, @bbbales2

Unfortunately the design-doc has no plan for how to implement it. The implementation section is only a sentence about C++ lambdas. It will not work because lambdas cannot autodiff correctly in higher-order functions.

Probably the easiest C++ implementation is to convert captured variables into extra parameters (“lambda-lifting”) and piggyback on the variadic arguments proposal. I think I could make this work by the time Stan math supports variadic arguments.

But let’s consider supporting proper closures directly in the math library.
Currently the higher-order functions take a “functor” struct.

struct foo_functor__ {
  template <typename T0__>
  typename boost::math::tools::promote_args<T0__>::type
  operator()(const T0__& x, std::ostream* pstream__) const {
    return foo(x, pstream__);
  }
};

The functor struct contains no data and is default-constructible. ODE solvers store the struct and call it with either var or double arguments depending on where they need autodiff.

The proposed variadic arguments are implemented by passing along an additional opaque tuple.
The parameter pack tuple is accessed with helper functions

count_vars(...) – total number of autodiffable vars
value_of(...) – create an identical double-only tuple (same data, no autodiff)
accumulate_adjoints(double*, ...) – copy the gradients into a vector
save_varis(vari**, ...) – copy the vari pointers into a vector
deep_copy_vars(...) – allocate new independent varis for nested autodiff

A “closure” object would combine the callable functor and the parameter pack.
The closure then exposes an API that should be equivalent to the above.

Here’s what I think it would look like.

The user defines a closure and passes it to a higher-order function hof

data { real c; ... }
parameters {
  real m;
  real s;
  ...
} model {
  real foo(real x) {
    return -(x-m)^2/s + c;
  }
  ... = hof(foo, ...);

The compiler translates it to C++

foo_fn__ foo(c, m, s);
... = hof(foo, ...);

where the class is

class foo_fn__ {
  double& c;
  var& m;
  var& s;
 public:
  int num_vars__;
  foo_fn__(double& c_, var& m_, var& s_)
    : c(c_), m(m_), s(s_), num_vars__(2) { }
  template<typename T>
  var operator()(T x) {
    return -square(x-m)/s + c;
  }
  // double interface
  class foo_fn_dbl__ {
    double& c;
    double m;
    double s;
   public:
    foo_fn_dbl__(double& c_, double m_, double s_)
      : c(c_), m(m_), s(s_) { }
    template<typename T>
    double operator()(T x) {
      return -square(x-m)/s + c;
    }
  };
  foo_fn_dbl__ value_of_() {
    return foo_fn_dbl__(c, value_of(m), value_of(s));
  }
  // vari interface
  void set_zero_adjoints() {
    m.vi_->set_zero_adjoint();
    s.vi_->set_zero_adjoint();
  }
  void accumulate_adjoints(double* gradients) {
    gradients[0] += m.adj();
    gradients[1] += s.adj();
  }
  void save_varis(vari** varis) {
    varis[0] = m.vi_;
    varis[1] = s.vi_;
  }
  foo_fn__ deep_copy_vars() {
    // captured vars are references, can't store the copies in the struct
    // place the copies e.g. on the nested ad stack
    var* m_ = ChainableStack::instance_->memalloc_.alloc_array<var>(1);
    *m_ = var(new vari(value_of(m), false));
    var* s_ = ChainableStack::instance_->memalloc_.alloc_array<var>(1);
    *s_ = var(new vari(value_of(s), false));
    return foo_fn__(c, *m_, *s_);
  }
};

Additionally MPI map_rect requires some way to serialize the object but other than that, is this API sufficient for all of our higher-order functions?

wds15 · May 7, 2020, 1:30pm

This looks promising!

Right now map_rect and reduce_sum enforce statelessness by requiring that you can default-construct the functor object you give to it. So this would need to be reworked, but from scanning over your proposal this looks fine.

Maybe this piggy-pack technique can be expanded to tuples directly?

(this would be a bit of work I am afraid, but closures are definitely cool to have)

bbbales2 · May 7, 2020, 1:40pm

If you wanna do this, absolutely have at it. That would be great!

Oh wow I need to get in gear on polishing the ODE stuff. If we could have closures and variadic arguments for writing ODEs, that would be slick.

I’ll take a closer look this afternoon (gotta go do something now). I’ll have to think about how any deep-copy stuff should work.

Like ideally none of that would be necessary cause the variable capture would be the right thing. But like reduce_sum and the ODE solvers do the weird nested autodiffs and in those cases we do need to be careful with the variables we captured.

bbbales2 · May 7, 2020, 8:04pm

Alright so the use case for pre-computing Jacobians during the forward sweep (for reduce_sum) would be something like:

auto myfunc(const auto& f, const auto& x) {
  nested_rev_autodiff nest;
  auto f_local = f.deep_copy();
  auto x_local = deep_copy(x);

  auto out = f_local(x_local);
  out.grad();

  Eigen::VectorXd adj = Eigen::VectorXd::Zero(f.count_vars() + count_vars(x));
  f_local.accumulate_adjoints(adj);
  accumulate_adjoints(adj.data() + f.count_vars(), x_local);

  return std::make_tuple(value_of(out), adj);
}

var outside(const auto& f, const auto& x) {
  vari** varis = ChainableStack::instance_->memalloc_.alloc_array<vari*>(f.count_vars() + count_vars(x));
  f.save_varis(varis);
  save_varis(varis + f.count_vars(), x);

  auto t = myfunc(f, x); // element 0 is value, element 1 are adjoints

  return var(new precomputed_gradients_vari(std::get<0>(t), f.count_vars(), varis, std::get<1>(t)));
}

I don’t think this needs a set_zero_adjoints(). Those would get handled by the regular nested::set_zero_adjoints() call.

Do you see any ways to make this cleaner or anything? Not that it’s not clean enough, but some things still look a bit manual.

I think what you’re proposing would work (though it would need count vars added, and each of the deep_copy/save_varis/accumulate adjoints should defer to those underlying implementations).

bbbales2 · May 12, 2020, 8:07pm

Have any more thoughts on this?

nhuurre · May 13, 2020, 10:36am

My current progress is on this stanc3 branch.
The parser accepts closures and creates corresponding callable objects. The object has an API for accessing the captured vars but it doesn’t do anything yet. I didn’t test anything but as long as you capture only data it hopefully works.
Now integrate_ode_bdf always invokes the variadic ode_bdf_tol and here’s a math branch where ode_bdf uses the closure object API.

So, this compiles

data {
  int<lower = 0> N;          // number of measurement times
  real ts[N];                // measurement times > 0
  vector[2] y_init;          // initial measured populations
  real<lower = 0> y[N, 2];   // measured populations
}
transformed data {
    real x_r[0];
    int x_i[0];
}
parameters {
  real<lower = 0> alpha;
  real<lower = 0> beta;
  real<lower = 0> gamma;
  real<lower = 0> delta;
  vector<lower = 0>[2] z_init;// initial population
  real<lower = 0> sigma[2];   // measurement errors
}
transformed parameters {
  functions
  vector dz_dt(real t, vector z) {
    real du_dt = (alpha - beta * z[2]) * z[1];
    real dv_dt = (-gamma + delta * z[1]) * z[2];
    return [du_dt, dv_dt]';
  }
  vector[2] z[N] = integrate_ode_bdf(dz_dt, z_init, 0.0, ts);
}
model {
  alpha ~ normal(1, 0.5);
  gamma ~ normal(1, 0.5);
  beta ~ normal(0.05, 0.05);
  delta ~ normal(0.05, 0.05);
  sigma ~ lognormal(-1, 1);
  z_init ~ lognormal(log(10), 1);
  for (k in 1:2) {
    y_init[k] ~ lognormal(log(z_init[k]), sigma[k]);
    y[ , k] ~ lognormal(log(z[, k]), sigma[k]);
  }
}

The generated C++ class is

template <typename captured_t__>
class closure0__ {
  const captured_t__ alpha;
  const captured_t__ beta;
  const captured_t__ delta;
  const captured_t__ gamma;
  public:
  using captured_scalar_t__ = captured_t__;
  const int num_vars__;
  closure0__(const captured_t__ alpha__, const captured_t__ beta__,
             const captured_t__ delta__, const captured_t__ gamma__)
  : alpha(alpha__), beta(beta__), delta(delta__),
  gamma(gamma__), num_vars__(stan::is_var<captured_t__>::value ? 1 + 1 + 1
                             + 1 : 0) { }
  template <typename T0__, typename T1__>
  Eigen::Matrix<typename boost::math::tools::promote_args<captured_t__,
typename boost::math::tools::promote_args<T0__,
T1__>::type>::type, -1, 1>
  operator()(const T0__& t, const Eigen::Matrix<T1__, -1, 1>&
             z, std::ostream* pstream__) const {
  return closure0_impl__(t, z, alpha, beta, delta, gamma, pstream__);
  }
  
  class ValueOf__ {
    const double alpha;
    const double beta;
    const double delta;
    const double gamma;
    public:
    ValueOf__(const closure0__& init) : alpha(value_of(init.alpha)),
    beta(value_of(init.beta)), delta(value_of(init.delta)),
    gamma(value_of(init.gamma)) { }
    template <typename T0__, typename T1__>
    Eigen::Matrix<typename boost::math::tools::promote_args<T0__,
T1__>::type, -1, 1>
    operator()(const T0__& t, const Eigen::Matrix<T1__, -1, 1>&
               z, std::ostream* pstream__) const {
    return closure0_impl__(t, z, alpha, beta, delta, gamma, pstream__);
    }
    };
  class DeepCopy__ {
    const captured_t__ alpha;
    const captured_t__ beta;
    const captured_t__ delta;
    const captured_t__ gamma;
    public:
    DeepCopy__(const closure0__& init) : alpha(deep_copy_vars(init.alpha)),
    beta(deep_copy_vars(init.beta)), delta(deep_copy_vars(init.delta)),
    gamma(deep_copy_vars(init.gamma)) { }
    template <typename T0__, typename T1__>
    Eigen::Matrix<typename boost::math::tools::promote_args<captured_t__,
typename boost::math::tools::promote_args<T0__,
T1__>::type>::type, -1, 1>
    operator()(const T0__& t, const Eigen::Matrix<T1__, -1, 1>&
               z, std::ostream* pstream__) const {
    return closure0_impl__(t, z, alpha, beta, delta, gamma, pstream__);
    }
    
    void accumulate_adjoints(double*) const { /* alpha *//* beta */
      /* delta *//* gamma */}
    };
  void set_zero_adjoints() const { /* alpha *//* beta *//* delta */
    /* gamma */ }
  void accumulate_adjoints(double*) const { /* alpha *//* beta *//* delta */
    /* gamma */}
  void save_varis(vari**) const { /* alpha *//* beta *//* delta */
    /* gamma */}
};

Filling in the missing parts shouldn’t be too hard.

One annoying thing I noticed is that when ode_bdf calls the function it puts the output stream as the third argument but most other callers put the stream in the last place. Not a problem for the above code where the function takes only three arguments but in general we need to do something about the inconsistency.

rok_cesnovar · May 13, 2020, 10:58am

The positioning of the pstream arugment is why we produce the rsfunctor for reduce sum and odefunctor in the variadic ode code: https://github.com/stan-dev/stanc3/pull/525/files#diff-05c667b4d63efa8b39abcae56e7b9834R188

I am not sure how we can make this consistent unless we make some sort of wrappers for the exisiting integrate_* , map_rect, etc.

It would be nice to not need this, that is for sure.

Also: Very nice!

bbbales2 · May 13, 2020, 11:08am

Wow, indeed there have been more thoughts lol.

That code looks amazing!

Is this something you need any help or info on? Or are you good?

Yeah, I think we need to standardize this.

The reason I put the variadic arguments last is then I can rely on C++ to do all the type deduction for the template parameter packs. This makes the math code cleaner.

I think we just update all the other things. We’ll have to anyway for these closures. I like variadic last, msgs second to last cause then C++ handles type deduction.

nhuurre · May 13, 2020, 11:17am

I think I go it. It just means this part

  void set_zero_adjoints() const { /* alpha *//* beta *//* delta */
    /* gamma */ }
  void accumulate_adjoints(double*) const { /* alpha *//* beta *//* delta */
    /* gamma */}
  void save_varis(vari**) const { /* alpha *//* beta *//* delta */
    /* gamma */}

needs to be filled in with something like

  void set_zero_adjoints() const {
    alpha.set_zero_adjoints();
    beta.set_zero_adjoints();
    gamma.set_zero_adjoints();
    delta.set_zero_adjoints();
  }
  void accumulate_adjoints(double* ptr) const {
    ptr++ = alpha.adj();
    ptr++ = beta.adj();
    ptr++ = gamma.adj();
    ptr++ = delta.adj();
  }
  void save_varis(vari** ptr) const {
    ptr++ = alpha.vi_;
    ptr++ = beta.vi_;
    ptr++ = gamma.vi_;
    ptr++ = delta.vi_;
  }

And loops for vectors and arrays.

nhuurre · June 5, 2020, 10:23am

Update: I haven’t forgotten about this.

The Lotka-Volterra model runs and gives the correct result. It’s a bit slower though. In fact all integrate_ode_* are based on mostly the same code should now work with closures.
algebra_solver, reduce_sum and map_rect do not compile.
Is the plan to migrate all of the higher-order functions to variadic arguments? How far it that?

User-defined higher-order functions are also allowed. The main restriction is that a function cannot return a closure.

In C++ a closure object looks like

class closure<captured_scalar_t__> {
  auto operator()(Args...); // call the function
  class ValueOf__; // closure without internal autodiff
  class DeepCopy__; // closure with independent autodiff varis
  const int num_vars__;
  void set_zero_adjoints();
  void accumulate_adjoints(double*);
  void save_varis(vari**);
};

And for an example of what the implementation looks like see the ode interface adaptor class. The nested class hierachy is a bit unwieldy. If anyone has ideas to simplify this please share. In theory closure<var>::ValueOf__ is the same as closure<double>::DeepCopy__ but clang did not like “instantiating a template within its own definition”.

Arrays are captured by reference. In the following model dz_dt does not make a copy of x and y but stores only a pointer.

data {
  real x[100];
}
parameters {
  real y[100];
}
transformed parameters {
  functions
  vector dz_dt(real t, vector z) {
    // do something with x and y
  }
  vector[K] z[N] = integrate_ode(dz_dt, z0, t0, ts);
}

That’s memory-efficient but creates a problem with adjoint sensitivity methods. The adjoint ODE must call dz_dt during the reverse pass and by that time y has gone out of scope. I guess closures would have to copy everything that’s not static data.

wds15 · June 5, 2020, 12:21pm

Very cool!

We have not decided if we make everything variadic yet. Though, the ode functions are being cast into that right noe by @bbbales2 and @charlesm93 wanted to attack the algebra_solver.

… but if closures are indeed doable and are running with not a too high of a performance hit, then variadic is less needed, obviously. Or does it still make sense to have most things variadic and closures? Maybe for performance reasons?

We should discuss a bit more and reflect… though having everything variadic and closure won’t be wrong. Or is the thing that variadic things do not work with what you do here (it should as I understand)?

nhuurre · June 5, 2020, 12:31pm

Closure object is basically a functor struct with a parameter pack inside it. Closures use much the same C++ plumbing as variadic arguments. It makes sense to do both at the same time. Also, with closures the function signatures no longer need the three extra arguments for params + real data + int data. I was thinking replacing those with variadic arguments would simplify the interface but keep it backwards-compatible.

bbbales2 · June 5, 2020, 2:12pm

Yeah, I think so. I also like this use of the word static.

Yeah.

The problem with the last interface is that there are optional arguments. There’s no way to tell if someone means to pass in double-double-int as arguments to their ode right hand side, or they’re passing them as tolerances.

I think the old interface just has to be something deprecated at this point and we plumb it into whatever future-thing we do.

As a way forward on this, I guess I’d like to open a pull request as a branch of variadic odes and get the math component of this working.

@nhuurre does this seem like a reasonable plan? If so, do you want to put together the closure-functor ODE branch? If you’re busy I can do this, but I’d like you to give me the functor object to test with (so I’m not making too many bad assumptions). I like the 1d ode system y' = a y t for testing.

Also, we want every function generated by stanc3 to be one of these functor objects, right? Or will there be some delineation?

Another question I have is how can functors handle higher order autodiff? This deep_copy/accumulate_adjoints sorta stuff we put together explicitly in a case where we weren’t doing higher order autodiff, but we should think about if there’s a way to generalize it before we spread it too far and wide in the code.

Also, summoning @Bob_Carpenter for comments

nhuurre · June 5, 2020, 2:36pm

In C++ maybe, but during code generation the Stan compiler knows exactly how many parameters the passed function/closure consumes. If it wants double, double, int then those are variadic arguments; if it doesn’t then they are tolerances.

I have closure-prototype branch for math and stanc3. Do you want me to open a (draft) pull request? Tests are broken on the math branch because ODEs expect a functor that implements the all the closure stuff but I didn’t update the test functors. There needs to be some way to create adapter class that wraps a non-closure functor in a dummy closure.

bbbales2 · June 5, 2020, 2:45pm

Hmm, good point. I don’t know the other implications of that.

Yo that rocks. Yeah open a pull.

I suppose we could do that. What’s the downside of just converting everything to functor-objects? I guess the obvious one is Stan c++ gets more awkward to use.

nhuurre · June 5, 2020, 4:24pm

github.com/stan-dev/math

[WIP] Closures prototype

stan-dev:feature/parameter-pack-odes ← nhuurre:closures-prototype

opened 03:49PM - 05 Jun 20 UTC

nhuurre

+941 -291

## Summary See the desing-doc https://github.com/stan-dev/design-docs/pull/6 …and the forum discussion https://discourse.mc-stan.org/t/closures-discussion/14932/ A proptotype autodiff interface for closures. This builds on the variadic ODEs https://github.com/stan-dev/math/pull/1641 ## Checklist - [ ] Math issue: ??? - [X] Copyright holder: Niko Huurre The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses: - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause) - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/) - [ ] the basic tests are passing - unit tests pass (to run, use: `./runTests.py test/unit`) - header checks pass, (`make test-headers`) - dependencies checks pass, (`make test-math-dependencies`) - docs build, (`make doxygen`) - code passes the built in [C++ standards](https://github.com/stan-dev/stan/wiki/Code-Quality) checks (`make cpplint`) - [ ] the code is written in idiomatic C++ and changes are documented in the doxygen - [ ] the new changes are tested

github.com/stan-dev/stanc3

[WIP] Closures prototype

stan-dev:master ← nhuurre:closures-prototype

opened 03:53PM - 05 Jun 20 UTC

nhuurre

+5600 -358

Add closures to the Stan language. The math library implementation is https://gi…thub.com/stan-dev/math/pull/1924 See also the desing-doc https://github.com/stan-dev/design-docs/pull/6 and the forum discussion https://discourse.mc-stan.org/t/closures-discussion/14932/ A closure definition begins `functions` keyword and otherwise looks just like a function definition. The keyword isn't ideal but something is needed to avoid ambiguity with ordinary variable declarations.

bbbales2 · June 5, 2020, 4:37pm

I pointed the math pull at the variadic ODE stuff so I could see the diff there.

nhuurre · June 5, 2020, 4:39pm

Oops, thanks.

nhuurre · June 5, 2020, 4:45pm

The diff still looks too large–how did those HMM commits get there? Did I merge in a wrong branch at some point?

bbbales2 · June 5, 2020, 4:47pm

Nah, I think the variadic ODEs stuff just fell behind develop and your branch is more up to date.

I’ll clean up the merge when I get to looking at this closer (in the next few days).

Topic		Replies	Views
Proposal for lambdas and closures Developers	5	531	August 23, 2019
A new continuation-based autodiff by refactoring Developers	24	5092	March 29, 2019
First-class functions, higher-order functions and closures General	5	996	April 18, 2018
Parameter packs for (de)serialization Developers	20	4909	August 20, 2018
Parallel autodiff v3 Developers	61	2848	January 23, 2020

Closures discussion

Related topics