Great question! I’ll do my best to summarize them:
CmdStanPy is essentially a wrapper for CmdStan, and PyStan (since version 3) is dependent on something called httpstan
. I think there is some merit in comparing directly the features of CmdStan and httpstan:
- CmdStan is always available for the latest version of Stan. Ideally, CmdStan can update and there is no change needed in the wrappers (like CmdStanPy) before you can use it. httpstan needs to be updated alongside Stan.
- CmdStan is a file based interface. Inputs are written to JSON, and outputs are read from CSV. PyStan/httpstan works in memory with your model in the same process. There are advantages and disadvantages to both in terms of memory usage, IO speed, etc.
- Because httpstan works directly with the model in memory, it does have two features not available in CmdStan: the ability to evaluate the log probability function from within Python, and the ability to evaluate the gradient of the log probability function. See Update below
- But, CmdStan supports inference methods other than HMC sampling, (like MLE and variational inference) which httpstan does not.
- CmdStan is available on Windows, httpstan is not.
Those are the core differences in the feature set.
On the Python level, there are a few other differences, mostly around slightly different names or ways of structuring the API. CmdStanPy has a few more options for tweaking the compilation of the model and reading the output than PyStan, I believe.
If all you’re interested in is running HMC and then processing your output using something like pandas, both would be able to do that, but they offer different things around that.
Update (2023): CmdStan now has the ability to evaluate the log probability and its gradient for debugging. It will be slightly slower than PyStan, both of which will be slower than something dedicated to working directly with the model like BridgeStan