I’ve created a prototype command line interface for Stan that’s meant to be easier to use than CmdStan. If anyone is a CmdStan users and has some time to check it out, please leave some feedback here on discourse.
> examples/bernoulli/bernoulli sample adapt delta=0.9 algorithm=hmc engine=nuts max_depth=14 data file=examples/bernoulli/bernoulli.data.R
In the prototype, the order of the options doesn’t matter. In CmdStan, arguments are nested, so you need algorithm=hmc engine=nuts max_depth=14 to set the max treedepth.
Feedback
If you have any feedback, good or bad, please post it here or dm me.
Within a particular subcommand (e.g. sample vs optimize) the keywords can’t overlap. But that isn’t really a problem in our arguments.
I had considered doing something more substantial, but wanted to wait before doing anything with more effort. Something larger should really be done with a design in mind. This one, I just hacked together.
Maybe I should be a little clearer. I didn’t just throw it together with throw-away code. I think it’s clean and thought through well enough for what it is. It just wasn’t something that had an API design up front, software design, then implementation. All three happened organically. There are plenty of different ways to implement the same thing. There are some choices I made that other people would have made differently. That’s why this is a prototype… it’s something I put together (with minimal feedback).
I showed it to @mitzimorris and @RTrangucci a few days ago. But otherwise, it was just a solo effort.
yes, the keyword “file” is ambiguous between data_file and output_file.
otherwise, the keywords are not ambiguous.
here’s a nice comparison of the library used to handle the command line arguments with the boost::program_options library: Comparing CLI11 and Boost PO -
I would really like to see this in cmdstan, if possible, not as a separate repo. We will very likely have to bump major version for the incoming language changes in the not-so-distant future anyways. And there is now a substantial amount of deprecated stuff, etc. Maybe we can do this together with that? Another suggestion would be to add this to the cmdstan3 design doc (https://github.com/stan-dev/design-docs/pull/15)?
We could make a translator of the old commands to the new ones fairly easy.
Awesome! If there’s enough demand, we could do that.
If we’re going to bump versions, I think there’s actually some substantial change we can make to make the arguments make more sense. Some things that I didn’t do in order to be compatible with CmdStan:
rename nun_samples to num_draws or even draws
change data_file to something like input_file or something similar so the short name is -i to match the output of -o (like other programs)
move fixed param out of sample. It’s not really sampling. It’s closer to generate quantities in some sense
add much more to diagnostics. @rok_cesnovar, benchmarking?
properly handle logging levels
different input and output formats
There’s a lot more that could be done.
This actually doesn’t address those design points. I thought about it and didn’t go that direction even though I wanted to. To do it properly from c++, we need to runtime load libraries, which there isn’t cross platform compatible. So… a bunch of headache there. We could do this using something like a python or bash script… but that’s something different too and wouldn’t have fixed the command arguments.
@jonah and @mitzimorris have done a bunch of work on argument names. The link to a google sheet should be somwhere in the cmdstan3 design doc somewhere. I think some of your suggestions are already in there.
I think we have been circling around this long enough we have come to the point we should just decide on it and do it. Doing it for the next release cycle (October) is too soon, but the release after that is probably doable. But we need to set up a plan, prepare a list of things we need to deprecate, etc
And yes, a bump should have a bunch more stuff. We do not want to bump major versions just so we can remove deprecated stuff.
Yes, all the logistics of making a non-breaking change.
I’m sort of leaning towards calling it something else so the logistics are a little easier. Cause now that it exists, I can already make things better like changing logging levels.
If we’re looking at a 6 month timeframe, it also makes sense to incubate it as a different project until it’s set. Meaning… CmdStan is mature and not going to change much. This prototype could change between now and when it gets a proper release. And that’s fine and expected. Hard to be collaborative on a single branch. (GitHub isn’t really designed around that.)
What if we work on a fork of Cmdstan on someone’s account and slowly build stuff there with CI and everything. I think that would be fine in terms of collaboration.
I am a bit worried that we will confuse users with another name and another interface. We already have a confusing structure of repos in my opinion and users have a hard time figuring out where to report errors or post requests.
But you if want this to be a new project then I am not going to oppose, given that you did all the work.
Re: logging levels
I may be wrong but wouldnt that mostly be changes to the services (stan repo).
In terms of whether this should be CmdStan or something new: is there any reason to have both? My initial thought is that nobody really benefits from having both (neither users nor developers), so we should just go with the better one. If this new one is preferred then that means we either call it CmdStan or we call it something else and retire the name CmdStan.
Anyway, that’s my initial inclination, but I could certainly be overlooking reasons to maintain both. Does anyone think we’d be better off maintaining both separately?
If API and parts are going to change much, then something similar as in pystan-next could be possible, but if most of the code is same, I’m not sure if there is a better way to do this than a new branch?
there are a few arguments which are candidates for deprecation - are there other items?
agreed - this should be easy - are you thinking of something like a quick pre-scan of the args?
I just revisited that document and the discussion. as Daniel said, we’d need to figure out how to link in the compiled models to make the proposed syntax work. the ambition level of design doc proposal falls between the amount of workflow management provided by the R and Python interfaces and the current CmdStan interface, and maybe that’s not really that useful - note that under that proposal CmdStan still only runs a single chain - it assumes that the user is comfortable working in a *nix or DOS terminal window and is also OK writing simple shell scripts in order to run multiple chains and manage the output files accordingly - (cf 4 MCMC Sampling | CmdStan User’s Guide).
I’m all for this new way of handling arguments - a sub-argument “method” followed by any number of arguments in any order.