Hello everyone!
I wanted to share with you my work on trying to use Stan algorithms without Stan-Math and .stan files.
Not relying on .stan files is relatively easy, just code up a header that looks similar to what stanc generates using stan::math::var
in log_prob
and done. Nothing much gained at this step.
Stan-Math obviously intervenes into Stan a bit more deeply but luckily not so much. Stan is really a great example of modular development. I really enjoyed (and learned a lot) navigating through the source code. Stan-Math portion mostly stays locked in stan::model
, so that’s the place for the work to be done.
Changes are rather minimal. A link to my branch. I added a subclass of stan::model::model_base
(which I called stan::model::model_base_interface
🙃). It doesn’t have templated methods and doesn’t have overloads with a different number of arguments so that it would be easy to subclass it in Python or Julia.
And I override stan::model::log_prob_grad/log_prob/log_prob_to/gradient
functions for derived classes of stan::model::model_base_interface
. And everything just works!
In the branch I included tests for stan::services::sample::hmc_nuts_diag_e_adapt
and stan::services::optimize::newton
.
Here is a link to the demo repository: GitHub - IvanYashchuk/stan_cpp_rosenbrock: A demo on using Stan's MCMC without Stan-Math and Stan modelling language
Instructions for running the example are in README.md
This is how the function that computes the log probability and its gradient looks like:
double log_prob_grad(Eigen::VectorXd ¶ms_r, Eigen::VectorXd &gradient,
bool propto, bool jacobian_adjust_transform,
std::ostream *msgs) const override {
gradient.resize(num_params_r());
double *x = params_r.data();
double *g = gradient.data();
int nn = 1;
double f = 0;
double alpha = 100.0;
// Compute 2D Rosenbrock function and its gradient
for (int i = 0; i < nn; i++) {
double t1 = x[2 * i + 1] - x[2 * i] * x[2 * i];
double t2 = 1 - x[2 * i];
f += alpha * t1 * t1 + t2 * t2;
g[2 * i] = -(-4 * alpha * t1 * x[2 * i] - 2.0 * t2);
g[2 * i + 1] = -(2 * alpha * t1);
}
return -f;
}
(not the most elegant code, but operations are performed on raw data pointers on purpose, that kind of in-place mutation autodiff can’t see. In Python world something like np.frombuffer would be used.)
Then main.cpp
looks like a straight-forward copy of some portion of command.hpp from CmdStan. So rosenbrock_model
can be passed to Stan functions, that are templated on Model
type.
This project is the first step towards the possibility to be able to use Stan as a sampling library (similarly to how there are many optimization libraries) and allow other PPLs to build bridges for using Stan’s algorithms easily.
How does it all sound to you? Does it look like something that could be eventually merged into Stan?