Hi! Keen to get thoughts on what the best approach is for having cmdstanpy as a dependency in a Python package. For the prophet package we currently have cmdstanpy as an optional backend (see GitHub - facebook/prophet: Tool for producing high quality forecasts for time series data that has) and end-users need to install cmdstan themselves before installing prophet. Is there a recommended way to make the installation just one step? Since prophet requires a build step anyways, I was thinking we could put install_cmdstan() somewhere inside setup.py.
And a follow-up question, is there any guidance on how we might include a stan model in a package without requiring the end-user to build the package? Installs can take a while (building cmdstan + compiling the stan model included in the package) and can fail for different reasons on different machines, so I’m wondering if there’s a way to package everything into a wheel. I guess I’m looking for an equivalent of how RStan does this, as described here Guidelines for developers of R packages interfacing with Stan • rstantools
It seems like prophet is on conda-forge already, so this should be very easy to do. (Edit: to be 100% clear, I mean including cmdstan[py] as a dependency should be very easy. I haven’t tried to see if conda could ship a compiled stan model, but maybe!)
To support both this and non-conda setups, I think that checking cmdstanpy.cmdstan_path() would work. This should throw if cmdstan isn’t found, and then you can call install_cmdstan internally.
Thanks for your responses!! @ahartikainen yep I ended up trying to package everything together (actually looking through past Issues I think you suggested the same thing a while back!) just using setup.py and a packaging tool.
I included the built cmdstan package in the wheel as well, but that bloated it by quite a bit (the wheel ended up being ~170MB). I think that’s the only way to ensure that the model .exe is run using the same cmdstan version used to compile it. It did work very nicely out of the box, but I’m not quite sure if if that’s worth the large wheel size.
I ended up pruning the stan/ folder and the downloaded binaries macos-stanc, linux-stanc, windows-stanc, etc. in order to get the package size way down, and everything still seems to work. Thanks for the tip @ahartikainen !
I think my original questions have been largely answered so thanks everyone :) In case anyone reading this is interested, here is the work-in-progress PR we have on Prophet at the moment: [Draft] Python Wheels for PyPi by tcuongd · Pull Request #2010 · facebook/prophet · GitHub . There are still some issues with Linux we need to figure out and other optimisations we can do.
After Stan model is compiled to an executable, then it doesn’t need anything from cmdstan, it is a standalone. The only thing the executable needs, is tbb executable, which needs to be on the $PATH. This executable can be found from cmdstan folder, but atleast on Windows you could just copy it into the same folder your executables are it would still work. On macos and linux I think you need to do some hack for the executable so that it works correctly.
we should think about the tooling required for development mode vs. the tooling required for production mode - there’s a brief discussion in the docs (https://mc-stan.org/cmdstanpy/workflow.html):
The statistical modeling enterprise has two principal modalities: development and production. The focus of development is model building, comparison, and validation. Many models are written and fitted to many kinds of data. The focus of production is using a trusted model on real-world data to obtain estimates for decision-making. In both modalities, the essential workflow remains the same: compile a Stan model, assemble input data, do inference on the model conditioned on the data, and validate, access, and export the results.
it sounds like the tooling for production mode would be just the model executable, the tbb lib, CmdStan utilities stansummary and diagnose, plus CmdStanPy. (probably can skip the CmdStan uilities).
I don’t quite follow this remove-copy idea? Is this to make sure that cmdstan is build with default settings?
I think this is not needed
Why is this needed? You pack the whole cmdstan folder structure + executables for summary etc?
I think what you should only need to do, is compile executable files (.bin) and then pack them into the wheel, as a data folder in your package + compiled tbb files. Then your user doesn’t need to have cmdstan installation.
Thanks @ahartikainen! Responses to your questions below:
Is this code run in CI or on user computer?
The main purpose would be to run on the facebook/prophet CI on package release, in order to generate the .whl files for PyPi. Ideally it’d work on users’ computers too if they want to build the package locally (but I guess we can think of this as a secondary use case for now).
I don’t quite follow this remove-copy idea? Is this to make sure that cmdstan is build with default settings?
I don’t think the remove is necessary on CI actually, you’re right. I guess this step is pretty defensive; it’d only be relevant if the end user had previously built the package in-place (i.e. in the same folder as the source code).
I think this is not needed
You’re right, thank you!
Why is this needed? You pack the whole cmdstan folder structure + executables for summary etc?
Yeah exactly.
I think what you should only need to do, is compile executable files ( .bin ) and then pack them into the wheel, as a data folder in your package + compiled tbb files.
Hmm we still use cmdstanpy code in our package, particularly the .fit(), .sample(), .draws() methods. The CmdStanMLE and CmdStanMCMC objects are also saved to the fitted Prophet object. I get your point though that the end user probably doesn’t care about the MCMC diagnostics at this point, and it’d probably simplify the setup script if we just remove the cmdstan installation altogether. I’m happy to include those executables in the wheel for now though (the total wheel size is still reasonable, <20mb).