Computational Backend for Stan

Hi Stan Devs,

My collaborator Alex and I are research scientists at Petuum, Inc. (and graduate students at CMU/Imperial). Petuum is working to build a data center operating system that supports all major programming languages and frameworks used in the modern statistics and machine learning community.

We are interested in including Stan in our efforts, because of its ease of use, simple modeling syntax, attention to detail in the HMC implementation, and robust design suitable for a wide variety of users and platforms.

One of the things we are interested in is supporting probabilistic programming languages on a variety of compute architectures, such as GPUs and compute clusters. This could be achieved in Stan by introducing a computational graph framework backend (TensorFlow, DyNet, …) that could perform automatic parallelization and deployment to a variety of systems.

Compared to the current design, this would offer the following advantages:

  • Maintainability: this would allow Stan to define the mathematical operations it needs, without worrying about optimizing or executing them, simplifying debugging and saving developer time.
  • Performance: through automatic parallelization and deployment, gradient operations can be transparently executed on GPUs and compute clusters (for applicable models).

We wanted to ask for your input on this idea. If practical and realistic, Petuum may be willing to invest time and engineering resources into its development. We would contribute our changes back as open source to the community.

Willie Neiswanger and Alex Terenin,

1 Like

This seems great to me (although I’m not the most knowledgeable about what it would take to do, but people who know what they’re talking about will respond soon enough).

The one thing to keep an eye on is that the HMC algorithms in Stan probably won’t work in single precision, so some architectures (ie ones that either don’t have doubles or have slow doubles) won’t be appropriate.

For Stan, that’d mean rewriting pretty much the entire math library and probably the sampling algorithms. Same issue as if you wanted to change R’s backend. At that point, you’d probably be better off just starting from scratch. Maybe you could use the AST generated by our parser and change the code generator.

What we’re doing instead is adding MPI for multi-core and GPU (double precision through OpenCL) for some matrix operations. The MPI is the big win as most likelihoods are embarassingly parallelizable. The GPU operations will help with operations like Cholesky factorization where data size is quadratic and it requires a cubic amount of matrix arithmetic.

Thanks for the feedback, Dan and Bob! We agree that this project is non-trivial in scope, but we’re still hoping to explore how at least a technical demonstration might be built.

@Bob: We’d like to learn more about the generated AST. What does it contain and represent, and what is the format? Is there some sort of documentation where we can read up about it?

Appreciate the info!
Willie and Alex

Nope, not really any doc other than what’s in the code. The root of the code tree for the AST is here:

It’s an object-oriented/templated C++ structure representing the parse tree for a Stan program. The most complicated part is the variant types for when there are disjunctions in the language. These require fairly complex callbacks to deal with (to write a function for a variant type, you need a function defined on each of the types—Boost organizes these into a class).

The AST output is then plugged into the generator. You could go up a level and trace down from compiler.hpp.