Working on the SBC package, I realized that some support for storing completed fits to avoid recomputing when the data/model does not change would be useful. And I’ve already implemented some form of fit caching on multiple occasions for my own scripts and recently also improved the way caching/storing fits works in
brms. So before implementing some caching mechanism once again in R, I thought that maybe this is a feature that could be useful to have either in the interfaces or even provide some support for it in core Stan (to make implementation in interfaces easier).
This is something I actually care about a bit, so I’d be willing to put some effort in drafting a design doc and implementing it. But before that, I’d like to get some less formal feedback on the broad outlines of the idea.
I imagine that for CmdStan this could be done quite easily:
- A hash of the input data and a hash of the model code would be added to the CSV output header.
- A new switch (e.g.
cache=yes) would be introduced. If set, the program will check if the output file exists and if yes then check that the model and data hash as well as all algorithm parameters (likely except for seed) match what is stored. If it matches, the program terminates immediately.
I imagine that code to compute the hash of input data could be introduced to Stan core.