Cookiecutter-cmdstanpy: a template for cmdstanpy projects

I am writing a template for cmdstanpy analyses called cookiecutter-cmdstanpy and would be really grateful for any feedback on the work done so far.

The overall idea is to avoid repeating work - instead of manually creating a bunch of folders and writing the same code every time you start a new cmdstanpy project, you can start with the template. Ideally, you would only have to make changes that reflect the parts of your analysis that are genuinely unique.

The template is based on cookiecutter, so you should be able to use it with the following commands:

pip install cookiecutter
cookiecutter https://github.com/teddygroves/cookiecutter-cmdstanpy

This should give you a file structure like this:

β”œβ”€β”€ LICENSE
β”œβ”€β”€ Makefile
β”œβ”€β”€ README.md
β”œβ”€β”€ bibliography.bib
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ fake
β”‚   β”‚   └── readme.md
β”‚   β”œβ”€β”€ prepared
β”‚   β”‚   └── readme.md
β”‚   └── raw
β”‚       β”œβ”€β”€ raw_measurements.csv
β”‚       └── readme.md
β”œβ”€β”€ fit_fake_data.py
β”œβ”€β”€ fit_real_data.py
β”œβ”€β”€ prepare_data.py
β”œβ”€β”€ report.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ results
β”‚   β”œβ”€β”€ infd
β”‚   β”‚   └── readme.md
β”‚   β”œβ”€β”€ input_data_json
β”‚   β”‚   └── readme.md
β”‚   β”œβ”€β”€ loo
β”‚   β”‚   └── readme.md
β”‚   β”œβ”€β”€ plots
β”‚   β”‚   └── readme.md
β”‚   └── samples
β”‚       └── readme.md
└── src
    β”œβ”€β”€ cmdstanpy_to_arviz.py
    β”œβ”€β”€ data_preparation.py
    β”œβ”€β”€ fake_data_generation.py
    β”œβ”€β”€ fitting.py
    β”œβ”€β”€ model_configuration.py
    β”œβ”€β”€ model_configurations_to_try.py
    β”œβ”€β”€ pandas_to_cmdstanpy.py
    β”œβ”€β”€ readme.md
    β”œβ”€β”€ stan
    β”‚   β”œβ”€β”€ custom_functions.stan
    β”‚   β”œβ”€β”€ model.stan
    β”‚   └── readme.md
    └── util.py

The repository and in particular the scripts prepare_data.py, fit_fake_data.py and fit_real_data.py should work straight away, based on the stan file at src/stan/model.stan and the raw measurements at data/raw/raw_measurements.csv. You should also be able to run make clean_all to get rid of created files and (if pandoc is available) make report.pdf to create a pdf report.

In order to implement a custom analysis you need to modify or replace src/stan/model.stan and data/raw/raw_measurements.csv and then tweak the python files in the src directory.

I’d really like to know what some current or potential cmdstanpy users think of this project. In particular:

  • Do you have a similar solution for avoiding repeated work when you start a new project?
  • How well does the overall structure match how you use cmdstanpy?
  • What do you think is the right balance between functionality and flexibility here? I tried to make the template as unopinionated as possible, but I don’t really know how other people tend to use cmdstanpy so I suspect it might be a bit too focused on my typical workflow.
  • Are there specific choices you would have made differently?
  • What is the biggest missing feature? I have listed a few here but I probably missed some.

Any help or opinions are very gratefully received!

6 Likes

hi Teddy,

totally sympathetic to this idea.
two potential problems:
a) every analysis is just different enough to be problematic
b) keeping the template simple enough to be immediately grok-able

regarding arviz - cmdstanpy to arviz should work better after recent refactor.

also, I’ve got an example of distributing cmdstanpy jobs over slurm - would this be something to add to this?

cheers,
Mitzi

2 Likes

I think json might be better format for data (if there are ndim arrays).

2 Likes

FYI, I’ve hade some issues, like divergent transitions not being displayed in pair plots and some issues (might have been on my side) with array orders (column v row major). Haven’t gotten around to filing a bug report or what not.

1 Like

Yeah that is something to think about - maybe the wizard should ask the user if they would rather input csv or json and then structure the repository accordingly.

1 Like

That’s a big challenge - I was surprised by what abstract code I ended up writing (and implicitly asking any user to understand) just in order to allow optionally varying the Stan program, priors and covariates. I’m fairly sure there’s room for improvement there but I guess there are also some things it’s just not worth making a template for. My intuition is that I do enough repeated work that some kind of template must be worthwhile but it’s hard to tell in advance what would be the most useful.

Definitely!

I’ve found the best approach to be to subclass cmdstanpys classes. But the again I wanted more than what cmdstanpy currently provides.

subclass cmdstanpys classes

cool! just curious - what did you want and which classes did you subclass?

Hm, only CmdStanModel and CmdStanMCMC and just for a few things, mainly for convenience. Except for the following:

  • I wanted to (semi-)gracefully be able to kill runs/chains. Whatever else I did always resulted in the chains staying running, messing up my timings. The solution was to reimplement (copy&paste) CmdStanMCMC's sample method, grabbing the ThreadPoolExecutor instance and the subprocesses, passing them to the subclassed CmdStanMCMC instance which then had all the information to terminate rogue chains, and to work with the partial results.
  • I wanted to be able to pool all warmup samples (from the last metric-window) and compute the covariance matrix. This is definitely a non-portable hack, because it depends on all parameters being unconstrained (the metric works on the unconstrained values, while stan only gives me the constrained values).

Other than that, I wanted the following things for convenience. I only looked at the source files for the above two classes, so I might have just missed some stuff:

  • Be able to work with fit objects where only some of the chains finished (see above).
  • Get per chain timings. Did I miss some functionality?
  • Collapse the stan summary to a boolean (i.e. everything alright or not).
  • Sometimes the ordering of arrays of vectors didn’t seem to be right. But I’m not sure anymore.
  • Randomly pick a draw to initialize a new run.
  • Inspect the source code and get functions/model blocks.
  • Restart sampling from a fit object.
  • Do some weird incremental warmup.
  • Addendum: I wanted the model to be able to infer some of the data from other parts of the data, like for example the length of an array.
  • Also: being able to sample from the prior.

Final edit:

  • Also, the sample function does not seem to play nice with numpys ndarray for the stepsize, and I think I could not simply pass multiple metrics or inits
1 Like

ArviZ might work in this case (tbh, there could be errors too)

I think these can be accessed from the stdout files?

Maybe ArviZ summary might be easier to work with?

Was this after you combined multiple chains? See arviz.InferenceData output, does that look correct?

I think a simple function could work in this case? (It might be easier to get inits from arviz.InferenceData)

(source β†’ cpp?)

Was there any problems?

In Stan code?

? (fixed_param + a custom .stan is probably easiest way to do this)

I think I already fixed these issues. But if this is not yet working, then we need to fix it.

no, you didn’t miss that - it’s not scraped out of the Stan csv file; that could be done and added to the RunSet object.

Ari’s correct that it’s also in the stdout files.

what do you mean by β€œrestart sampling”?

  • continue adaptation? in which case, initializing with stepsize and mass matrix should work.
  • get more samples? initializing with stepsize and mass matrix, no_warmup, will produce more samples. but there’s a problem in that we don’t save the state of the RNG - the seed + chain_id is the initial offset, but then during a run, we record how many times it’s called. to properly continue, one would need to know both the seed and how far to advance the RNG.
1 Like

we recommend running multiple chains in order to be able to test convergence. because if you don’t have convergence, whatever your sample is, you have no evidence that you have a set of draws from the posterior, consequently, no reason to believe that the samples are valid.

therefore, you need to wait until all chains have finished sampling. granted, this is problematic if you’re not able to run all chains in parallel or otherwise distribute them across a cluster.

1 Like