Idea: A simple plugin system for CmdStan

martinmodrak · January 29, 2025, 6:05am

Note: this is an idea that I’ve been thinking about for a while and which came up relevant to How to evaluate samplers for inclusion in Stan? At this point, I can’t back that idea up with manpower to implement it, so it is a bit toothless. Posting in the hope it may inspire somebody to do something even better

The core goal is to let people provide implementations of samplers that can be used with CmdStan (and the interfaces building on top of it) without officially endorsing the samplers and including them in Stan proper. This way, samplers could get a bit more testing and exposure to realworld problems before being evaluated for inclusion.

One solution would be a plugin system. In a minimalist version a CmStan plugin definition would consist of:

Name of a method handled the plugin (like sample, optimize, diagnose)
Name of a C++ function dispatched when the method is called from the command line and receives the rest of the command line arguments as input
A directory of files to add on the include path when compiling the Cmdstan main program
A path to a header file within the above directory that needs to be included to make the function in 2) available

With that it should be fairly easy to modify the build steps for Cmdstan programs to read a list of installed plugins and automatically modify the source code of the Cmdstan main program to invoke the plugins.

In this settings cmdstanr and cmdstanpy interfaces would add an run_plugin method that runs a user-specified plugin and makes common shared functionality (like writing out a data file) available.

WardBrian · January 29, 2025, 3:00pm

What you’re describing is pretty different from what is traditionally meant by “plugin”, which is typically something loadable entirely at runtime.

For example, an alternative would be that a plugin definition is just a path to a compiled shared library which must have certain functions defined. Minimally, there could be exactly one required function, which returns an object describing the rest of the available plugin functionality, e.g., “extension_point_name() → string” which just tells you the name in your first bullet point

Bob_Carpenter · February 4, 2025, 5:37pm

The first bottleneck is always getting the right arguments to functions in a generic way. If you consider the existing Stan samplers, VI, and optimization methods, they’re bristling with control parameters that get reflected in both CmdStan and in the higher-level interfaces (e.g., how dual averaging works to estimate step size, size of adaptation windows, initial step size, initial draw, etc.). How to pass all of those to a C++ function in a reasonable way is challenging.

The second bottleneck is make. Plugins are going to require dynamic linking.

As an alternative, I think we can just do what Nutpie did—just supply a wrapper around BridgeStan to allow sampling of Stan models entirely outside of Stan, but produce output that’s compatible with analysis tools like posterior or ArviZ.

aseyboldt · February 5, 2025, 11:18am

I’m also not so sure a plugin system would be the best approach. Especially if samplers want to deviate more from how nuts works, this gets complicated quickly. Are there still chains? Are they fully independent or do they need to exchange some data? What sampler stats are there etc…

But maybe there are ways to make it easier to write separate sampler libraries. bridgestan and arviz already help a lot with this. But there still is a bit of a gap I think:

Writing traces was a surprising amount of work. It is almost embarrassing how much of the code in nutpie is related to that (maybe because the way I’m doing it wasn’t the smartest though…). If we could have a library that makes that easier, that’d be great.
We also don’t really have a common format for traces that can be read from R and python. If there were a well-supported way to read an arviz zarr or netcdf file into R that was compatible with a lot of the infrastructure there, that’d be great.
That can be both for in-memory representations and on disk. apache arrow or zarr might be candidates for those. (nutpie is currently using arrow internally, but I’ve been thinking about switching to zarr).

Bob_Carpenter · February 11, 2025, 10:58pm

I’m assuming by this you mean draws. What kind of library would have helped?

I feel your pain there. Every time I think I can get draws into ArviZ, I get frustrated and give up, despite the fact that it’s just a parameters x chains x draws array with names with a parameters sized array of names. For Stan, we save these as separate CSV files with headers for the parameter names, and one row per draw. These are easy to read in and reassemble into the correct format. I just can’t figure out how to get that into ArviZ. I would probably struggle on the R side, too, but I don’t work there any more.

Topic		Replies	Views
Using Pure C++ Instead of CommandStan Developers	2	582	August 1, 2018
CmdStan & Stan 2.34 release candidate General	3	846	January 16, 2024
Config validation, request for feedback Developers	7	736	June 21, 2017
Cmdstan 2.24.1 is released Announcements cmdstan	16	1539	August 22, 2020
CmdStan generate_quantities and stansummary CmdStan	14	1319	January 9, 2022

Idea: A simple plugin system for CmdStan

Related topics