Idea: A simple plugin system for CmdStan

Note: this is an idea that I’ve been thinking about for a while and which came up relevant to How to evaluate samplers for inclusion in Stan? At this point, I can’t back that idea up with manpower to implement it, so it is a bit toothless. Posting in the hope it may inspire somebody to do something even better

The core goal is to let people provide implementations of samplers that can be used with CmdStan (and the interfaces building on top of it) without officially endorsing the samplers and including them in Stan proper. This way, samplers could get a bit more testing and exposure to realworld problems before being evaluated for inclusion.

One solution would be a plugin system. In a minimalist version a CmStan plugin definition would consist of:

  1. Name of a method handled the plugin (like sample, optimize, diagnose)
  2. Name of a C++ function dispatched when the method is called from the command line and receives the rest of the command line arguments as input
  3. A directory of files to add on the include path when compiling the Cmdstan main program
  4. A path to a header file within the above directory that needs to be included to make the function in 2) available

With that it should be fairly easy to modify the build steps for Cmdstan programs to read a list of installed plugins and automatically modify the source code of the Cmdstan main program to invoke the plugins.

In this settings cmdstanr and cmdstanpy interfaces would add an run_plugin method that runs a user-specified plugin and makes common shared functionality (like writing out a data file) available.

1 Like

What you’re describing is pretty different from what is traditionally meant by “plugin”, which is typically something loadable entirely at runtime.

For example, an alternative would be that a plugin definition is just a path to a compiled shared library which must have certain functions defined. Minimally, there could be exactly one required function, which returns an object describing the rest of the available plugin functionality, e.g., “extension_point_name() → string” which just tells you the name in your first bullet point

1 Like

The first bottleneck is always getting the right arguments to functions in a generic way. If you consider the existing Stan samplers, VI, and optimization methods, they’re bristling with control parameters that get reflected in both CmdStan and in the higher-level interfaces (e.g., how dual averaging works to estimate step size, size of adaptation windows, initial step size, initial draw, etc.). How to pass all of those to a C++ function in a reasonable way is challenging.

The second bottleneck is make. Plugins are going to require dynamic linking.

As an alternative, I think we can just do what Nutpie did—just supply a wrapper around BridgeStan to allow sampling of Stan models entirely outside of Stan, but produce output that’s compatible with analysis tools like posterior or ArviZ.

2 Likes

I’m also not so sure a plugin system would be the best approach. Especially if samplers want to deviate more from how nuts works, this gets complicated quickly. Are there still chains? Are they fully independent or do they need to exchange some data? What sampler stats are there etc…

But maybe there are ways to make it easier to write separate sampler libraries. bridgestan and arviz already help a lot with this. But there still is a bit of a gap I think:

Writing traces was a surprising amount of work. It is almost embarrassing how much of the code in nutpie is related to that (maybe because the way I’m doing it wasn’t the smartest though…). If we could have a library that makes that easier, that’d be great.
We also don’t really have a common format for traces that can be read from R and python. If there were a well-supported way to read an arviz zarr or netcdf file into R that was compatible with a lot of the infrastructure there, that’d be great.
That can be both for in-memory representations and on disk. apache arrow or zarr might be candidates for those. (nutpie is currently using arrow internally, but I’ve been thinking about switching to zarr).

2 Likes