CmdStanPy provdes access to the Stan compiler, inference algorithms, and diagnostics.
It supports both development and production workflows and is well-suited for teaching and learning Stan.
Reasons to use CmdStanPy:
Runs with latest Stan release
Easy to install: CmdStanPy can automatically CmdStan and the underlying C++ toolchains
Scalable: can fit big models and big datasets - if CmdStan can fit a model and dataset, so can CmdStanPy.
Cross-platform: runs on Linux, macOS, and Windows.
Thanks for the hard work on this. But now we have two python front-ends to Stan (pystan and cmdstanpy). Is there a summary of the differences and when we might prefer one or the other?
(I am mostly asking about the differences in the user experience, rather than their distinct back-end philosophies, although those are obviously related.)
Great question! I’ll do my best to summarize them:
CmdStanPy is essentially a wrapper for CmdStan, and PyStan (since version 3) is dependent on something called httpstan. I think there is some merit in comparing directly the features of CmdStan and httpstan:
CmdStan is always available for the latest version of Stan. Ideally, CmdStan can update and there is no change needed in the wrappers (like CmdStanPy) before you can use it. httpstan needs to be updated alongside Stan.
CmdStan is a file based interface. Inputs are written to JSON, and outputs are read from CSV. PyStan/httpstan works in memory with your model in the same process. There are advantages and disadvantages to both in terms of memory usage, IO speed, etc.
Because httpstan works directly with the model in memory, it does have two features not available in CmdStan: the ability to evaluate the log probability function from within Python, and the ability to evaluate the gradient of the log probability function. See Update below
But, CmdStan supports inference methods other than HMC sampling, (like MLE and variational inference) which httpstan does not.
CmdStan is available on Windows, httpstan is not.
Those are the core differences in the feature set.
On the Python level, there are a few other differences, mostly around slightly different names or ways of structuring the API. CmdStanPy has a few more options for tweaking the compilation of the model and reading the output than PyStan, I believe.
If all you’re interested in is running HMC and then processing your output using something like pandas, both would be able to do that, but they offer different things around that.
Update (2023): CmdStan now has the ability to evaluate the log probability and its gradient for debugging. It will be slightly slower than PyStan, both of which will be slower than something dedicated to working directly with the model like BridgeStan