No worries at all! No expectation for everyone to be familiar with that history. But yeah we did used to get (maybe still do?) many PRs without discussion first.
FWIW, I am in favor of this.
yes, definitely - should be āsave_output_filesā - will add that detail to the CmdStanPy issue.
If I want more fine-grained control over the output file names and where the samples are streamed to, should I just use base CmdStan?
If weāre going to put all this effort into making it easier to use CmdStan then Iād like the wrappers to let users do pretty much anything useful that you can do with CmdStan directly. Would this
solve the problem here or is that still too limiting?
Will look through this Monday morning as I am trying to not work during this 4th of July holiday weekend.
I want to spend a little time understanding the design you and Mitzi have put together and how that differs from what is in my head, before I start making any more requests.
What are the use cases you foresee for CmdStan and its R/Python wrappers?
- Replacement for R/PyStan and therefore the new backend for things like rstanarm
- Simpler interface for building complex analysis workflows on clusters?
- ???
Design decisions that are good for one use case are detrimental to another. I have my preferences but those I admit are due to a rather narrow perspective on how I want to use CmdStanPy.
the goals are spelled out here: https://github.com/stan-dev/cmdstanpy/blob/master/README.md
before that we went through the design process - https://github.com/stan-dev/design-docs/blob/master/designs/0002-cmdstanpy_func_spec.md
things have evolved in the past year as weāve gotten more feedback from users,
and weāre always happy to get more feedback.
I posted in issue #254 an alternative proposal.
- Include option to change output file basename, though chain # is still appended to the end. If basename option is ignored, one still gets
ProgramName_StartDateTime
as default. - Make csv file paths accessible in sampling object.
- Get rid of automatic temp directory generation and stream sampling data to
output_dir
. - Use
.ckpt
as secondary extension for streaming data rather than randomly generated character string.
I really like a design that is modularized and has separate programs that do one thing well. These small changes, I believe, make it easier to slot Stan into a larger workflow DAG. I know most Stan users are working on their local machines, but hopefully I havenāt caused to much difficulty in their standard practice.
@mitzimorris @jonah
Do you need more details in the issue about how this proposal would work, as in how it would be concretely implemented or the use cases where the change is necessary?
yes, I saw your proposal.
@mtwest Thanks for the reminder. Iāve been a bit swamped and havenāt had a chance to fully go over the proposal. Will add that to my to-do list for next week. Definitely please keep pinging me on it if I donāt get back to you though!
@mtwest I think part of the issue here is that we donāt exactly know what your code looks like. Are you doing a separate call to the sample
method on each node? In that case doesnāt including a node id in your specification of the output_dir
argument ensure unique file names across nodes? If youāre not doing that, then what code are you currently running on the different nodes?