I thought I’d post an update as I think this project is now more or less usable - at least I’ve found it useful for my own cmdstanpy projects. If anyone is looking for a cmdstanpy template, please try it out and let me know your opinions - I’d really appreciate any feedback.
You can find the project here and fetch it from the command line like this:
pip install cookiecutter
cookiecutter gh:teddygroves/cookiecutter-cmdstanpy
I made quite a few changes with the general aim of being flexible enough to allow arbitrary models, configuration
and data processing while keeping things as simple and intuitive as possible. The readme now contains a lot more explanation and some detailed examples.
Here is the new project structure (the writing
directory is optional)
.
├── LICENSE
├── Makefile
├── README.md
├── analyse.py
├── data
│ ├── prepared
│ │ └── readme.md
│ └── raw
│ ├── raw_measurements.csv
│ └── readme.md
├── model_configurations
│ ├── interaction.toml
│ ├── interaction_fake_data.toml
│ └── no_interaction.toml
├── prepare_data.py
├── pyproject.toml
├── requirements.txt
├── results
│ └── runs
│ └── readme.md
├── sample.py
├── src
│ ├── data_preparation.py
│ ├── model_configuration.py
│ ├── prepared_data.py
│ ├── readme.md
│ ├── sampling.py
│ ├── stan
│ │ ├── custom_functions.stan
│ │ ├── model.stan
│ │ └── readme.md
│ └── util.py
└── writing
├── bibliography.bib
├── img
│ ├── example.png
│ └── readme.md
└── report.md
The template comes with a working example model - you should be able to try it out with make analysis
straight away. This makes it possible to edit the template incrementally so that it implements your analysis, with the possibility to re-run at any time in order to check your work.
The only abstractions are now ModelConfiguration
and PreparedData
. A ModelConfiguration
is a representation of a toml file with instructions for which modes to run in (choices are currently prior, posterior and k-fold cross validatation), where to find a Stan program and prepared data and how to run cmdstanpy (including stanc_options
, cpp_options
and keyword arguments for CmdStanModel.sample
). There are a few examples in the folder model_configurations
. PreparedData
is a a definition of what prepared data should look like.
By default the results are converted to arviz InferenceData
and saved in netcdf format to results/runs/<model configuration name>/mode.nc
.
It’s now possible for the raw data to have any format with a little editing . The readme goes into some examples of how to do this.
I was a little unsure at first whether a cmdstanpy (or more generally statistical analysis) template was a viable project, mainly because as @mitzimorris says above the goals of being both flexible and easy to use and understand kind of pull in opposite directions. However I’m now using the template for any project that threatens to get bigger than a few files and anecdotally it definitely feels like an improvement on what I was doing before. I also noticed that project templates seem to be very popular in the deep learning world, e.g. this tensorflow_template_application and this pytorch-template have 1.9k and 3.2k github stars respectively.
I’m also curious about whether there are any other statistical analysis templates out there - if anyone knows one please post it!