First-class functions, higher-order functions and closures


Bob has suggested that it might be desirable to add support for first-class functions, higher-order functions and closures to Stan. In particular, he notes that these would be useful for the following purposes:

Mixture modeling like simple mixtures or HMMs. We can write a general
HMM forward algorithm if we could somehow abstract the emission probability

Our ODE integrator is based on a system function which would hugely benefit
from closure-based binding of data and parameter variables.

Mapping a unary function over a collection (we want to do this all the time)

Additionally, wds15 raised the issue in the context of the Stan 3 language:

I haven’t seen any news on functional programming… is that planned as well?

I would like to Curry my functions, pass functions around and wrap them into closures. I know this is asking for a lot, but if this is a thread for wishes, then those are mine.

Personally, I am interested in the issue from a more theoretical programming language semantics point of view. (I am working on the topic as a postdoc with Luke Ong in Oxford.) I am hoping this would close some of the gap with some of the work on probabilistic programming in the academic PL community. Moreover, the implementation of these constructs seems more feasible with the move to C++ 11. I am keen to give it a shot, if this is indeed something that people are interested in.

Therefore, I thought I’d open a thread on the issue.

I’d be really interested in hearing your thoughts on:

  • use cases for the above language constructs and their desirability from a practitioners point of view;
  • issues to do with a possible implementation;
  • related suggestions;
  • volunteers to join me in this endeavour :-) .


I was thinking of simple typing, where we have our basic types int, real, etc., and then we can recusively construct arrays out of any type and also functions from any type to any other type. Tuples will provide a heterogeneous product.

  • HMM emission distributions and same for general mixtures

  • general maps

  • closures to capture parameters and data for ODE and algebraic solver system functions; currying and uncurrying would be nice for these, too

  • language needs to be determined on the Stan side

  • I think you can generate a pretty straightforward lambda in C++ corresponding to the Stan declaration—the automatic bindings would be for whatever would be in scope; I hope that includes the member variables in a class, because that’s where the data is. The parameters will be local variables.

  • this could all be done before adding functions as arguments, which will require generalizing the type system

I’ll certainly help. Mitzi’s working on the low-level language features.


This could really change the game for the ODE models.

I had a model once where the x_r and x_i structures looked something like:

x_i = struct {
  int N;
  int M;

struct {
  (4-5 more reals);

And the parameters were something like:

theta = struct {
  matrix[N, N];

The ODE itself was about 30 lines, but getting everything out and in the right format was 60. We never really got the model working, so it’s not a good example really (cause you can’t run it haha). @wds15 would have better examples.


Another use-case for higher order functions in Stan can be found in Gaussian processes. As I understand it, really, we would like to parametrise those with a mean and covariance function. Using Bob’s proposal for the Stan 3 syntax, and adding higher-order functions to Stan, we could write a Gaussian process regression like this:

real gaussian_process_regression(Func(real,real) mu_fun,
                                 Func(<real,real>, real) sigma_fun,
                                 int N1,
                                 int N,
                                 vector x,
                                 vector y1) {
  vector[N] f;
  vector[N] mu;
  param real<lower=0> sigma;
  param vector[N] eta;
  real delta = 1e-9;
    matrix[N, N] L_K;
    matrix[N, N] K;
    for (n in 1:N) {
      for (m in 1:N) {
        K[n,m] = sigma_fun(x[n], x[m]);
    // diagonal elements
    for (n in 1:N)
      K[n, n] = K[n, n] + delta;
    L_K = cholesky_decompose(K);
    f = L_K * eta;
    for (n in 1:N) mu[n] = mu_fun(x[n]);
  sigma ~ normal(0, 1);
  eta ~ normal(0, 1);
  y1 - mu[1:N1] ~ normal(f[1:N1], sigma);
  vector[N2] y2;
  for (n2 in 1:N2)
    y2[n2] = normal_rng(f[N1 + n2] - mu[N1 + n2], sigma);

We could then use it in a Stan program as follows:

data int<lower=1> N1;
data real x1[N1];
data vector[N1] y1;
data int<lower=1> N2;
data real x2[N2];  

int<lower=1> N = N1 + N2;
real x[N];
for (n1 in 1:N1) x[n1] = x1[n1];
for (n2 in 1:N2) x[N1 + n2] = x2[n2];
gaussian_process_regression(mu_example, sigma_example, N1, N, x, y1);

where, for instance,

real mu_example(real r) {
  return 0;

real sigma_example(real r, real s) {
  param real<lower=0> rho;
  param real<lower=0> alpha;
  rho ~ inv_gamma(5, 5);
  alpha ~ normal(0, 1);
  return cov_exp_quad(r, s, alpha, rho);

One thing I like about this formulation is that it gives novice users access to a relatively black box Gaussian process implementation where they can choose their own covariance function. Moreover, is seems closer to a textbook account of GPs. Finally, I like that it cleanly separates the parameters that are an inherent part of the GP from the parameters of the covariance function.

However, there is no reason why the user shouldn’t just directly provide the matrix K as an argument, as that is the only bit that is used, which would eliminate the need for function arguments. Conceptually, I wouldn’t be quite as clean though, as we always have that K is the covariance function mapped over the data:

 K = map(sigma_fun, x, x);

which is then left to the user to perform.


I’ve thought the same thing about HMMs. You want something like a transition function, but that may then depend on predictors in genralized HMMs. For usual HMMs, it’s constant across the input time series, whereas with general ones it varies. And you want something like an emission function. The problem I run into is that it’s hard to abstract all the things that could go into calculating those functions to provide as an argument to a generic HMM function. It’s much more efficient to just use a single transition matrix if that’s what’s going on rather than forcing it to be blow up inot a matrix. I’m still trying to figure out a good way to do encapsulate these things.

You can already formulate GPs that look just like the textbooks using the same mean and covariance function breakdown:

y ~ multi_normal(mean_fun(...), cov_fun(...));

This doesn’t really encapsulate a GP, though. It does separate the mean and covariance functions, which can itself be problematic if they have shared computations. So we might want a combined
mean-covariacne function (easier after we add tuples).

You’d have to suffix that function with _lp in Stan in order to give it access to the target log density accumulator.