First-class functions, higher-order functions and closures

Matthijs · August 21, 2017, 7:41pm

Bob has suggested that it might be desirable to add support for first-class functions, higher-order functions and closures to Stan. In particular, he notes that these would be useful for the following purposes:

Mixture modeling like simple mixtures or HMMs. We can write a general
HMM forward algorithm if we could somehow abstract the emission probability
function.

Our ODE integrator is based on a system function which would hugely benefit
from closure-based binding of data and parameter variables.

Mapping a unary function over a collection (we want to do this all the time)

Additionally, wds15 raised the issue in the context of the Stan 3 language:

I haven’t seen any news on functional programming… is that planned as well?

I would like to Curry my functions, pass functions around and wrap them into closures. I know this is asking for a lot, but if this is a thread for wishes, then those are mine.

Personally, I am interested in the issue from a more theoretical programming language semantics point of view. (I am working on the topic as a postdoc with Luke Ong in Oxford.) I am hoping this would close some of the gap with some of the work on probabilistic programming in the academic PL community. Moreover, the implementation of these constructs seems more feasible with the move to C++ 11. I am keen to give it a shot, if this is indeed something that people are interested in.

Therefore, I thought I’d open a thread on the issue.

I’d be really interested in hearing your thoughts on:

use cases for the above language constructs and their desirability from a practitioners point of view;
issues to do with a possible implementation;
related suggestions;
volunteers to join me in this endeavour :-) .

Bob_Carpenter · August 22, 2017, 1:25pm

I was thinking of simple typing, where we have our basic types int, real, etc., and then we can recusively construct arrays out of any type and also functions from any type to any other type. Tuples will provide a heterogeneous product.

HMM emission distributions and same for general mixtures
general maps
closures to capture parameters and data for ODE and algebraic solver system functions; currying and uncurrying would be nice for these, too

language needs to be determined on the Stan side
I think you can generate a pretty straightforward lambda in C++ corresponding to the Stan declaration—the automatic bindings would be for whatever would be in scope; I hope that includes the member variables in a class, because that’s where the data is. The parameters will be local variables.
this could all be done before adding functions as arguments, which will require generalizing the type system

I’ll certainly help. Mitzi’s working on the low-level language features.

bbbales2 · August 22, 2017, 2:34pm

This could really change the game for the ODE models.

I had a model once where the x_r and x_i structures looked something like:

x_i = struct {
  int N;
  int M;
};

struct {
  real;
  real;
  (4-5 more reals);
  vector[M];
  vector[M];
  vector[N];
  vector[N];
}

And the parameters were something like:

theta = struct {
  matrix[N, N];
}

The ODE itself was about 30 lines, but getting everything out and in the right format was 60. We never really got the model working, so it’s not a good example really (cause you can’t run it haha). @wds15 would have better examples.

Matthijs · September 5, 2017, 2:20pm

Another use-case for higher order functions in Stan can be found in Gaussian processes. As I understand it, really, we would like to parametrise those with a mean and covariance function. Using Bob’s proposal for the Stan 3 syntax, and adding higher-order functions to Stan, we could write a Gaussian process regression like this:

real gaussian_process_regression(Func(real,real) mu_fun,
                                 Func(<real,real>, real) sigma_fun,
                                 int N1,
                                 int N,
                                 vector x,
                                 vector y1) {
  vector[N] f;
  vector[N] mu;
  param real<lower=0> sigma;
  param vector[N] eta;
  real delta = 1e-9;
  
  {
    matrix[N, N] L_K;
    matrix[N, N] K;
    for (n in 1:N) {
      for (m in 1:N) {
        K[n,m] = sigma_fun(x[n], x[m]);
      }
    }
  
    // diagonal elements
    for (n in 1:N)
      K[n, n] = K[n, n] + delta;
    
    L_K = cholesky_decompose(K);
    f = L_K * eta;
    
    for (n in 1:N) mu[n] = mu_fun(x[n]);
  }
  sigma ~ normal(0, 1);
  eta ~ normal(0, 1);
  y1 - mu[1:N1] ~ normal(f[1:N1], sigma);
  
  vector[N2] y2;
  for (n2 in 1:N2)
    y2[n2] = normal_rng(f[N1 + n2] - mu[N1 + n2], sigma);
}

We could then use it in a Stan program as follows:

data int<lower=1> N1;
data real x1[N1];
data vector[N1] y1;
data int<lower=1> N2;
data real x2[N2];  

int<lower=1> N = N1 + N2;
real x[N];
for (n1 in 1:N1) x[n1] = x1[n1];
for (n2 in 1:N2) x[N1 + n2] = x2[n2];
  
gaussian_process_regression(mu_example, sigma_example, N1, N, x, y1);

where, for instance,

real mu_example(real r) {
  return 0;
}

real sigma_example(real r, real s) {
  param real<lower=0> rho;
  param real<lower=0> alpha;
  
  rho ~ inv_gamma(5, 5);
  alpha ~ normal(0, 1);
  
  return cov_exp_quad(r, s, alpha, rho);
}

One thing I like about this formulation is that it gives novice users access to a relatively black box Gaussian process implementation where they can choose their own covariance function. Moreover, is seems closer to a textbook account of GPs. Finally, I like that it cleanly separates the parameters that are an inherent part of the GP from the parameters of the covariance function.

However, there is no reason why the user shouldn’t just directly provide the matrix K as an argument, as that is the only bit that is used, which would eliminate the need for function arguments. Conceptually, I wouldn’t be quite as clean though, as we always have that K is the covariance function mapped over the data:

 K = map(sigma_fun, x, x);

which is then left to the user to perform.

Bob_Carpenter · September 10, 2017, 10:55am

I’ve thought the same thing about HMMs. You want something like a transition function, but that may then depend on predictors in genralized HMMs. For usual HMMs, it’s constant across the input time series, whereas with general ones it varies. And you want something like an emission function. The problem I run into is that it’s hard to abstract all the things that could go into calculating those functions to provide as an argument to a generic HMM function. It’s much more efficient to just use a single transition matrix if that’s what’s going on rather than forcing it to be blow up inot a matrix. I’m still trying to figure out a good way to do encapsulate these things.

You can already formulate GPs that look just like the textbooks using the same mean and covariance function breakdown:

y ~ multi_normal(mean_fun(...), cov_fun(...));

This doesn’t really encapsulate a GP, though. It does separate the mean and covariance functions, which can itself be problematic if they have shared computations. So we might want a combined
mean-covariacne function (easier after we add tuples).

You’d have to suffix that function with _lp in Stan in order to give it access to the target log density accumulator.

bgoodri · April 18, 2018, 11:55pm

Boost 1.67 now has a HOF sublibrary

https://www.boost.org/doc/libs/1_67_0/libs/hof/doc/html/doc/

Topic		Replies	Views
Closures discussion Developers	24	1497	June 8, 2020
Stan++/Stan3 Preliminary Design Developers	97	4756	June 12, 2018
Proposal for lambdas and closures Developers	5	538	August 23, 2019
Allow undefined high order function Developers	5	964	May 12, 2021
Generalizing the Stan language for Stan 3 Developers features	56	4740	September 1, 2017

First-class functions, higher-order functions and closures

Related topics