This is an update to: Chkptstanr: checkpoint MCMC sampling in Stan
tldr;
This pre-release fixes most major bugs of the original chkptstanr package, which allows you to stop and resume sampling with brms and cmdstanr via regular checkpoints. However, I now believe that adaptation is performed incorrectly, as discussed here. Thus, I do not recommend using it for any production work. The package now works - you can stop and start sampling at will - you just have to be mindful that warmup might not be doing what you think. I wondered whether to post this at all, but given that the original package is on CRAN, I decided that it is better to make this clear, and potentially open up the discussion for how to improve it.
The current detailed Development Roadmap is available here.
Release notes chkptstanr v0.2.0-alpha
New maintainer
- With the permission of the original creator, Donald R. Williams, Ven Popov becomes the new maintainer of the package. The development will continue at venpopov/chkptstanr.
Major bug fixes
- Resolve error âstan_code_pathâ not found when resuming sampling, which completely prevented the core functionality of the package from working (original issue #8]
- Resolve incorrect detection of existing model binaries, which was causing the package to fail to detect changes to arguments and incorrectly continue to sample (#2)
- Fix the incorrect combination of checkpoint samples into a single stanfit object, which was causing problems with post-processing methods (#8)
- chkpt_brms() now works with any brm() arguments, including custom families, data2, etc, rather than giving an error (original issue #15)
New features
- Add argument âstop_afterâ to predetermine a stopping checkpoint. This allows you to predetermine a fixed point to stop the sampling after a certain number of iterations, e.g.
stop_after = 1000
will stop the sampling after 1000 iterations. (original issue #4) - Add argument
reset
to restart sampling. This allows you to reset the checkpointing process and start from the beginning without recompiling the model. Settingreset = TRUE
will delete the existing checkpoints but keep the stan model code and binary. This is also available via the new functionreset_checkpoints(path)
, which achieves the same. - Return a brmsfit object when sampling is interrupted. Instead of having to reconstruct the samples manually,
chkpt_brms()
now returns abrmsfit
object if post-warmup sampling is stopped for any reason, either programmatically via stop_after, because of an error, or due to a manual abort by the user. Thebrmsfit
object will contain samples until the last successful checkpoint. You can resume sampling from the last checkpoint by rerunning the same code. (#4) - No longer necessary to manually create a folder for the checkpoints via âcreate_folder()â before using
chkpt_brms()
orchkpt_stan()
.create_folder()
is deprecated. Please provide the folder name or full path to the argument path directly tochkpt_brms()
and a folder to store the checkpoints will be created automatically. This significantly simplifies the workflow. - You can now reuse checkpoint folders. The path argument to
chkpt_brms()
andchkpt_stan()
no longer give an error if a folder already exist, allowing a reusable programmatic workflow - Checkpoint folders can be specified with a nested path. The path argument to
chkpt_brms()
andchkpt_stan()
works with nested folder names, e.g."output/checkpoints1"
, even ifoutput/
does not exist - You can now use any formula that brm accepts. Remove an unnecessary check that the formula should be a
brmsformula
object, allowing for more flexibility in the input tochkpt_brms()
such asmvbrmsformula
objects or other arguments thatbrm()
accepts (original issue #9)
Minor bug fixes
- Fix an incorrect error message when providing iter_warmup, iter_sampling, or iter_warmup+iter_sampling not divisible by iter_per_chkpt. The error message now correctly states that the number of iterations per checkpoint must be a divisor of the all three quantities.
Other changes
- Automated testing for package stability. Set-up initial automated testing and continuous integration with GitHub Actions to ensure the package is always working as expected
- Change default number of chains from 2 to 4 to be consistent with brms defaults
- Rename argument âiter_typicalâ to âiter_adaptationâ to better reflect what this stage is doing. iter_typical is deprecated. In future releases, the adaptation procedure will be rewritten and this argument will be completely removed (see #10)