Specifying output file names

No worries at all! No expectation for everyone to be familiar with that history. But yeah we did used to get (maybe still do?) many PRs without discussion first.

1 Like

FWIW, I am in favor of this.

yes, definitely - should be ā€œsave_output_filesā€ - will add that detail to the CmdStanPy issue.

1 Like

If I want more fine-grained control over the output file names and where the samples are streamed to, should I just use base CmdStan?

If weā€™re going to put all this effort into making it easier to use CmdStan then Iā€™d like the wrappers to let users do pretty much anything useful that you can do with CmdStan directly. Would this

solve the problem here or is that still too limiting?

1 Like

Will look through this Monday morning as I am trying to not work during this 4th of July holiday weekend.

I want to spend a little time understanding the design you and Mitzi have put together and how that differs from what is in my head, before I start making any more requests.

1 Like

What are the use cases you foresee for CmdStan and its R/Python wrappers?

  • Replacement for R/PyStan and therefore the new backend for things like rstanarm
  • Simpler interface for building complex analysis workflows on clusters?
  • ???

Design decisions that are good for one use case are detrimental to another. I have my preferences but those I admit are due to a rather narrow perspective on how I want to use CmdStanPy.

the goals are spelled out here: https://github.com/stan-dev/cmdstanpy/blob/master/README.md
before that we went through the design process - https://github.com/stan-dev/design-docs/blob/master/designs/0002-cmdstanpy_func_spec.md
things have evolved in the past year as weā€™ve gotten more feedback from users,
and weā€™re always happy to get more feedback.

1 Like

I posted in issue #254 an alternative proposal.

  • Include option to change output file basename, though chain # is still appended to the end. If basename option is ignored, one still gets ProgramName_StartDateTime as default.
  • Make csv file paths accessible in sampling object.
  • Get rid of automatic temp directory generation and stream sampling data to output_dir.
  • Use .ckpt as secondary extension for streaming data rather than randomly generated character string.

I really like a design that is modularized and has separate programs that do one thing well. These small changes, I believe, make it easier to slot Stan into a larger workflow DAG. I know most Stan users are working on their local machines, but hopefully I havenā€™t caused to much difficulty in their standard practice.

@mitzimorris @jonah
Do you need more details in the issue about how this proposal would work, as in how it would be concretely implemented or the use cases where the change is necessary?

1 Like

yes, I saw your proposal.

1 Like

@mtwest Thanks for the reminder. Iā€™ve been a bit swamped and havenā€™t had a chance to fully go over the proposal. Will add that to my to-do list for next week. Definitely please keep pinging me on it if I donā€™t get back to you though!

1 Like

@mtwest I think part of the issue here is that we donā€™t exactly know what your code looks like. Are you doing a separate call to the sample method on each node? In that case doesnā€™t including a node id in your specification of the output_dir argument ensure unique file names across nodes? If youā€™re not doing that, then what code are you currently running on the different nodes?

1 Like