Rainier currently provides two samplers: affine-invariant MCMC, an ensemble method popularized by the Emcee package in Python, and Hamiltonian Monte Carlo, a gradient-based method used in Stan and PyMC3.
Depending on your background, you might think of Rainier as aspiring to be either “Stan, but on the JVM”, or “TensorFlow, but for small data”.
As a rough comparison, Rainier seems to yield a 10x or more speedup relative to the equivalent Stan models. This is promising, though please keep in mind that benchmarking is hard, micro-benchmarks are often meaningless, and Stan’s sampler implementation is much more sophisticated and much, much, much better tested than Rainier’s!
… Within those constraints, however, it is extremely fast. Rainier takes advantage of knowing all of your data ahead of time by aggressively precomputing as much as it can, which is a significant practical benefit relative to systems that compile a data-agnostic model. It produces optimized, unboxed, JIT-friendly JVM bytecode for all numerical calculations. This compilation happens in-process and is fast enough for interactive use at a REPL.
Our implementation includes the dual-averaging automatic tuning of step-size from the NUTS paper, but requires you to manually specify a number of leapfrog steps. In the future, we plan to implement the full NUTS algorithm to also dynamically select the number of steps.
To me, the static compute graph makes Ranier sound more like Edward (on TensorFlow) or PyMC3 (on Theano).
With its dynamic compute graphs, Stan is much more like Pyro (on PyTorch).
emcee brings a lot of black-box functionality, but in the end, that wasn’t our focus with Stan. We did build the algorithms, but ensemble samplers don’t scale well with dimension, so we decided not to include them.
TensorFlow has an eager mode which seems to add a bunch of dynamic computation with some overhead. I haven’t been following closely.
Despite the heavy hedging, I wonder what they’re comparing and on which models.
Without NUTS, HMC isn’t nearly as efficient in practical terms. But NUTS seems to be a challenge for these static compute graph systems for reasons I don’t fully understand. I’d have thought the autodiff was in the inner loop, not encapsulating the entire computation.
Now that would be nice.
What’s the state of numerical and matrix libraries that’ll run with this (not Ranier, but within Scala)? Is it all remote procedure calls into Fortran?