Specifying output file names

jonah · July 2, 2020, 7:33pm

No worries at all! No expectation for everyone to be familiar with that history. But yeah we did used to get (maybe still do?) many PRs without discussion first.

rok_cesnovar · July 2, 2020, 7:37pm

FWIW, I am in favor of this.

mitzimorris · July 2, 2020, 7:37pm

yes, definitely - should be “save_output_files” - will add that detail to the CmdStanPy issue.

mtwest · July 4, 2020, 9:50am

If I want more fine-grained control over the output file names and where the samples are streamed to, should I just use base CmdStan?

jonah · July 4, 2020, 3:41pm

If we’re going to put all this effort into making it easier to use CmdStan then I’d like the wrappers to let users do pretty much anything useful that you can do with CmdStan directly. Would this

solve the problem here or is that still too limiting?

mtwest · July 5, 2020, 9:15am

Will look through this Monday morning as I am trying to not work during this 4th of July holiday weekend.

I want to spend a little time understanding the design you and Mitzi have put together and how that differs from what is in my head, before I start making any more requests.

mtwest · July 7, 2020, 1:33pm

What are the use cases you foresee for CmdStan and its R/Python wrappers?

Replacement for R/PyStan and therefore the new backend for things like rstanarm
Simpler interface for building complex analysis workflows on clusters?
???

Design decisions that are good for one use case are detrimental to another. I have my preferences but those I admit are due to a rather narrow perspective on how I want to use CmdStanPy.

mitzimorris · July 7, 2020, 2:10pm

the goals are spelled out here: https://github.com/stan-dev/cmdstanpy/blob/master/README.md
before that we went through the design process - https://github.com/stan-dev/design-docs/blob/master/designs/0002-cmdstanpy_func_spec.md
things have evolved in the past year as we’ve gotten more feedback from users,
and we’re always happy to get more feedback.

mtwest · July 7, 2020, 5:11pm

I posted in issue #254 an alternative proposal.

Include option to change output file basename, though chain # is still appended to the end. If basename option is ignored, one still gets ProgramName_StartDateTime as default.
Make csv file paths accessible in sampling object.
Get rid of automatic temp directory generation and stream sampling data to output_dir.
Use .ckpt as secondary extension for streaming data rather than randomly generated character string.

I really like a design that is modularized and has separate programs that do one thing well. These small changes, I believe, make it easier to slot Stan into a larger workflow DAG. I know most Stan users are working on their local machines, but hopefully I haven’t caused to much difficulty in their standard practice.

mtwest · July 24, 2020, 4:55pm

@mitzimorris @jonah
Do you need more details in the issue about how this proposal would work, as in how it would be concretely implemented or the use cases where the change is necessary?

mitzimorris · July 24, 2020, 7:08pm

yes, I saw your proposal.

jonah · July 24, 2020, 9:33pm

@mtwest Thanks for the reminder. I’ve been a bit swamped and haven’t had a chance to fully go over the proposal. Will add that to my to-do list for next week. Definitely please keep pinging me on it if I don’t get back to you though!

jonah · July 31, 2020, 7:34pm

@mtwest I think part of the issue here is that we don’t exactly know what your code looks like. Are you doing a separate call to the sample method on each node? In that case doesn’t including a node id in your specification of the output_dir argument ensure unique file names across nodes? If you’re not doing that, then what code are you currently running on the different nodes?

Topic		Replies	Views
Cleaning up files generated by cmdstanpy CmdStan cmdstanpy	7	1623	August 29, 2020
CmdStan output file import error [Supplied CSV files were not generated with the same model!] CmdStan	2	630	May 17, 2022
Specifying path to compiled Stan program with cmdstanpy CmdStan cmdstanpy	2	575	September 14, 2020
Is there a way to only write some results to disk in cmdstanpy / cmdstanr? Interfaces cmdstanr , cmdstanpy	3	72	April 16, 2025
Creating a cmdstanr object from cmdstan CSVs General cmdstanr	5	595	January 21, 2021

Specifying output file names

Related topics