Best practices for Make-like declarative workflows with Stan

wlandau · July 15, 2020, 2:40pm

Background

Because model compilation and MCMC are inevitably time-consuming, I find it extremely helpful to write entire data analysis projects as formal end-to-end Make-like declarative workflows. Here is a sketch of an actual Makefile that might do this.

all:
	samples.fst

samples.fst: run_model.R model.rds
	Rscript run_model.R

model.rds: compile_model.R model.stan
	Rscript compile_model.R # Sets auto_write = TRUE and save_dso = TRUE.

data.fst: simulate_data.R
	Rscript simulate_data.R

clean:
	rm -f *.fst *.rds

In practice, because I use R, I usually opt for a Make-like R package like drake or targets instead of GNU Make itself. (Full disclosure: I am the creator and maintainer of both these R packages.) In fact, I am trying to develop best practices for using rstan with targets, and my first attempt is here.

Issue

I am not sure if I am writing the model compilation target correctly in the Makefile above (or here with targets). That target assumes the input file is model.stan and the output file is model.rds, so the model will recompile and downstream targets may rerun if either file changes. But I am not sure it is sufficient to track the RDS file. I feel as though a step like this should also include the actual DSO file and any other binaries that get created. How do I find the names of these files? In general, what is the best way to reproducibly track the compilation step and completely guarantee that downstream targets do not recompile the model?

I realize the DSO file name might not be known in advance, and there may be multiple output files from compilation. This presents a challenge for GNU Make, but not for drake or targets because both can easily handle multiple dynamic input/output files per target. For the model compilation target in the targets example, the compile_model() function just needs to return the names of all the files that model depends on.

bgoodri · July 15, 2020, 3:49pm

RStan should already be checking whether the mtime of the .stan file is later than the mtime of the .rds file. Their names are the same except for the suffix.

wds15 · July 15, 2020, 4:09pm

I haven’t used it, but the drake r package could be helpful.

mike-lawrence · July 15, 2020, 6:22pm

@wds15: @wlandau is the developer of drake :)

wds15 · July 15, 2020, 6:27pm

Ups… thanks for the hint…

Drake looks super cool… just did not yet have the time to use it…

mike-lawrence · July 15, 2020, 6:30pm

This is neat. I use drake in my paid work (where it has been an amazing tool to track/accelerate our pipeline development), and stan in my volunteer work, so having @wlandau show up here seeking to unite the two is exciting! Unfortunately I’m not knowledgeable enough in this particular realm to help. Just wanted to express enthusiasm!

(And by reading the post more carefully I have now learned of the drake successor targets!)

wlandau · November 11, 2020, 6:14pm

Update: I am in the process of switching my Bayesian work over to cmdstanr, which appears to work much more seamlessly than rstan for drake and targets. I am feeling pretty satisfied about the transition so far.

Pedagogical example: https://github.com/wlandau/targets-stan
RStudio Cloud workspace: https://rstudio.cloud/project/1430719/
Recorded talk: https://www.youtube.com/watch?v=Qq25BUxpJu4

Topic		Replies	Views
Stantargets: a new workflow automation package for cmdstanr projects Publicity cmdstanr	4	825	November 30, 2020
Making Rstan packages: Calling the compiled model from R script Developers rstan	1	592	December 21, 2017
Recompile only modified Stan models when using rstantools? RStan	3	563	May 2, 2020
RStan refactor branch Developers maintenance	17	1380	November 29, 2016
Conditional make files for map_rect with rstantools packages Developers rstan	6	685	March 12, 2019

Best practices for Make-like declarative workflows with Stan

Background

Issue

Related topics