Binary function vectorization and broadcasting with C++ templates


#1

Standalone example

I thought it would be helpful to provide a simple, standalone implementation of vectorization in C++ with all the template magic included. It needs to be generalized with boost::promote_args in the very first template metaprogram struct. There’s a bit of doc explaining what’s going on. I’m happy to elaborate more if people have questions.

This general approach is already in Stan for unary functions.

More efficient alternatives

This is not the most efficient way to implement broadcasting and vectorization.

Cache computations:   For example, consider hypot(double x, double y) == sqrt(square(x) + square(y)) and its vectorized form hypot(const vector<double>& x, double y). In the latter, we’d ideally only compute square(y) once and reuse it. That could be done with an implementation along the lines of our distributions. But it would require coding a new version

Adjoint-Jacobian form:   An orthogonal improvement is to bypass the operands_and_partials approach we use in the probability functions and move straight to an adjoint-Jacobian product form, which will reduce virtual function calls and memory use in the expression graph. There’s example code in that issue.

Generalization

This (and its unary version, which is much simpler) could probably be extended beyond scalars to vectorize multivariate functions. There’s nothing really specific about the base types here, other than that if we allow containers, then we need to worry about the contained type relations (i.e., covariant types in the CS sense, not the stats sense).

C++ Code

#include <vector>
#include <iostream>

// calculate return type of vectorized function
template <typename T1, typename T2>
struct vectorized_return {
  typedef double type;  // generalize to `boost::promote_args<T1, T2>::type`
};
template <typename T1, typename T2>
struct vectorized_return<T1, std::vector<T2>> {
  typedef std::vector<typename vectorized_return<T1, T2>::type> type;
};
template <typename T1, typename T2>
struct vectorized_return<std::vector<T1>, T2> {
  typedef std::vector<typename vectorized_return<T1, T2>::type> type;
};
template <typename T1, typename T2>
struct vectorized_return<std::vector<T1>, std::vector<T2>> {
  typedef std::vector<typename vectorized_return<T1, T2>::type> type;
};


// compute vectorized result, overloads per arg shape;  three instances
// per container type because container types must match;  should just
// work for autodiff vars
template <class F, typename T1, typename T2>
typename vectorized_return<T1, T2>::type
vectorize(const T1& x1, const T2& x2) {
  return F::apply(x1, x2);
}
template <class F, typename T1, typename T2>
typename vectorized_return<T1, std::vector<T2>>::type
vectorize(const T1& x1, const std::vector<T2>& x2) {
  typename vectorized_return<std::vector<T1>, T2>::type y(x2.size());
  for (size_t i = 0; i < x2.size(); ++i)
    y[i] = F::apply(x1, x2[i]);
  return y;
}
template <class F, typename T1, typename T2>
typename vectorized_return<std::vector<T1>, T2>::type
vectorize(const std::vector<T1>& x1, const T2& x2) {
  typename vectorized_return<std::vector<T1>, T2>::type y(x1.size());
  for (size_t i = 0; i < x1.size(); ++i)
    y[i] = F::apply(x1[i], x2);
  return y;
}
template <class F, typename T1, typename T2>
typename vectorized_return<std::vector<T1>, std::vector<T2>>::type
vectorize(const std::vector<T1>& x1, const std::vector<T2>& x2) {
  typename vectorized_return<std::vector<T1>, T2>::type y(x1.size());
  for (size_t i = 0; i < x1.size(); ++i)
    y[i] = F::apply(x1[i], x2[i]);
  return y;
}


// (1) assume that foo(scalar, scalar) fully defined for
// prim (int, double), fwd (fvar<T>), rev (var) instantiations
// including mixing autodiff types with double types;
// this function's name need not match the eventual name;  if
// they're different, the vectorization defines the double case
double foo(double x1, double x2) {
  return x1 + x2;
}

// (2) embed in templated static fun in struct;  could use namespace to
// qualify call
struct foo_fun {
  template <typename T1, typename T2>
  static typename vectorized_return<T1, T2>::type
  apply(const T1& x1, const T2& x2) {
    return foo(x1, x2);
  }
};

// (3) define vectorized function with vectorization;  not ambiguos
// with more specific double version
template <typename T1, typename T2>
typename vectorized_return<T1, T2>::type
foo(const T1& x1, const T2& x2) {
  return vectorize<foo_fun>(x1, x2);
}


// just making sure it all works
int main() {
  double x1 = 3.2;
  double x2 = 2.9;
  std::vector<double> xs1 = { 1.5, 2.3 };
  std::vector<double> xs2 = { -2.8, 62 };

  // check all instantiations
  double b = foo(1, 1);
  double a = foo(1, 1.3);
  double c = foo(1.3, 1);
  double d = foo(1.7, 2.9);

  std::vector<double> e = foo(1, xs2);
  std::vector<double> f = foo(xs1, 1);
  std::vector<double> g = foo(xs1, xs2);

  // make sure values are right
  double y1 = foo(x1, x2);
  std::cout << "foo(x1, x2) = " << y1 << std::endl;

  std::vector<double> ys2 = foo(xs1, x2);
  for (int i = 0; i < ys2.size(); ++i)
    std::cout << "foo(xs1, x2)[" << i << "]=" << ys2[i] << std::endl;

  std::vector<double> ys3 = foo(x1, xs2);
  for (int i = 0; i < ys3.size(); ++i)
    std::cout << "foo(x1, xs2)[" << i << "]=" << ys3[i] << std::endl;

  std::vector<double> ys4 = foo(xs1, xs2);
  for (int i = 0; i < ys4.size(); ++i)
    std::cout << "foo(xs1, xs2)[" << i << "]=" << ys4[i] << std::endl;
}

Here’s what it does:

~/temp2$ clang++ --std=c++11 vec.cpp
~/temp2$ ./a.out
foo(x1, x2) = 6.1
foo(xs1, x2)[0]=4.4
foo(xs1, x2)[1]=5.2
foo(x1, xs2)[0]=0.4
foo(x1, xs2)[1]=65.2
foo(xs1, xs2)[0]=-1.3
foo(xs1, xs2)[1]=64.3
~/temp2$