Interface to Stan's algorithms from C++ and beyond

Hello everyone!

I wanted to share with you my work on trying to use Stan algorithms without Stan-Math and .stan files.

Not relying on .stan files is relatively easy, just code up a header that looks similar to what stanc generates using stan::math::var in log_prob and done. Nothing much gained at this step.
Stan-Math obviously intervenes into Stan a bit more deeply but luckily not so much. Stan is really a great example of modular development. I really enjoyed (and learned a lot) navigating through the source code. Stan-Math portion mostly stays locked in stan::model, so that’s the place for the work to be done.

Changes are rather minimal. A link to my branch. I added a subclass of stan::model::model_base (which I called stan::model::model_base_interface🙃). It doesn’t have templated methods and doesn’t have overloads with a different number of arguments so that it would be easy to subclass it in Python or Julia.
And I override stan::model::log_prob_grad/log_prob/log_prob_to/gradient functions for derived classes of stan::model::model_base_interface. And everything just works!
In the branch I included tests for stan::services::sample::hmc_nuts_diag_e_adapt and stan::services::optimize::newton.

Here is a link to the demo repository: GitHub - IvanYashchuk/stan_cpp_rosenbrock: A demo on using Stan's MCMC without Stan-Math and Stan modelling language
Instructions for running the example are in README.md

This is how the function that computes the log probability and its gradient looks like:

  double log_prob_grad(Eigen::VectorXd &params_r, Eigen::VectorXd &gradient,
                       bool propto, bool jacobian_adjust_transform,
                       std::ostream *msgs) const override {
    gradient.resize(num_params_r());

    double *x = params_r.data();
    double *g = gradient.data();
    int nn = 1;
    double f = 0;
    double alpha = 100.0;

    // Compute 2D Rosenbrock function and its gradient
    for (int i = 0; i < nn; i++) {
      double t1 = x[2 * i + 1] - x[2 * i] * x[2 * i];
      double t2 = 1 - x[2 * i];
      f += alpha * t1 * t1 + t2 * t2;
      g[2 * i] = -(-4 * alpha * t1 * x[2 * i] - 2.0 * t2);
      g[2 * i + 1] = -(2 * alpha * t1);
    }
    return -f;
  }

(not the most elegant code, but operations are performed on raw data pointers on purpose, that kind of in-place mutation autodiff can’t see. In Python world something like np.frombuffer would be used.)

Then main.cpp looks like a straight-forward copy of some portion of command.hpp from CmdStan. So rosenbrock_model can be passed to Stan functions, that are templated on Model type.

This project is the first step towards the possibility to be able to use Stan as a sampling library (similarly to how there are many optimization libraries) and allow other PPLs to build bridges for using Stan’s algorithms easily.

How does it all sound to you? Does it look like something that could be eventually merged into Stan?

5 Likes

Looks neat! Though I’m a bit confused, give you give a more general outline of what your goal is here and the steps you think need to be done to accomplish that?

What’s the particular challenge here? Would you be able to just call the non-templated functions from model_base? Passing a vector of doubles to log_prob runs log prob with doubles, var types with var, etc.

It looks like this does not use autodiff at all? Is that your goal here?

The goal here is to be able to use Stan samplers without using autodiff from Stan Math. So if I have a function implemented in C++ ( or in Python/Julia later) that computes the log probability and its gradient I should be able to use all gradient-based algorithms from Stan. The gradient could be possibly computed using some other autodiff system.

The challenge is exactly avoiding var types and so avoiding dealing with registration of the gradients with operands_and_partials/precomputed_gradient/adj_jac_apply/reverse_pass_callback.

For example now stan::model::gradient and stan::model::log_prob_grad functions work by calling Stan Math’s autodiff on var variables.
What I did in the branch is just adding a dispatch: if the passed model is a subclass of my interface class then use the gradient function provided by that subclass else use normal Stan Math-based autodiff computations.

Calling the templated or overloaded methods is not a problem, the problem is in cross-language inheritance. model_base class has two sets of functions: one that takes Eigen::Vector<double/var> params_r and another that takes std::vector<double/var> params_r, std::vector<int> params_i arguments. Function names are the same but the number of arguments is different, that’s perfectly fine to have in C++ of course, but Python doesn’t allow overloads with the different number of arguments and it just overrides the previous definitions making it problematic to write one function both for Eigen and std::vector inputs. And both of them are required in Stan so that everything works. I want to be able to write the model class in Python and pass it to wrapped stan::services::sample::hmc_nuts_diag_e_adapt and expect it to work (it does work in my prototype).

1 Like

I read a post from @eigenfoo about this sort of thing once: `littlemcmc` — A Standalone HMC and NUTS Sampler in Python | George Ho

That might be a good way to do something like this, since it is design specifically for modularity.

3 Likes

Here is a link to the Colab notebook with cross-language inheritance in action. The model is implemented in Python using NumPy (scipy.optimize.rosen and scipy.optimize.rosen_der).

2 Likes