Prototype: Simplified CLI to Stan

Hi all,

I’ve created a prototype command line interface for Stan that’s meant to be easier to use than CmdStan. If anyone is a CmdStan users and has some time to check it out, please leave some feedback here on discourse.

There are more details in this blogpost: https://www.generable.com/blog/2020/08/prototype-simplified-cli-to-stan/

For those that don’t want to click through, here’s some of the information.


How to get the Prototype

The prototype is currently on a branch of CmdStan, feature/cli11 .

Example: setting delta to 0.9 and max_depth to 14

Prototype:

> examples/bernoulli/bernoulli sample --delta 0.9 --max_depth 14  --data_file examples/bernoulli/bernoulli.data.R

CmdStan v2.24.0:

> examples/bernoulli/bernoulli sample adapt delta=0.9 algorithm=hmc engine=nuts max_depth=14 data file=examples/bernoulli/bernoulli.data.R

In the prototype, the order of the options doesn’t matter. In CmdStan, arguments are nested, so you need algorithm=hmc engine=nuts max_depth=14 to set the max treedepth.

Feedback

If you have any feedback, good or bad, please post it here or dm me.

12 Likes

cc: @mitzimorris, @rok_cesnovar, @wds15, @Bob_Carpenter, @stevebronder.

I’m trying to tag CmdStan devs and users. Please let them know this exists.

Looks clean.

Would --delta=0.9 work?

This interface is assuming there are no overlap between the “nested” structure? (This is fine assumption in my opinion)

Yup.

Not sure what this means… some clarification?

Yes, I meant no overlap between the keywords/arguments.

Do we currently have any keywords/arguments which has same name but do different things?

Cool. I definitely find CmdStan rather cumbersome to use, so anything that makes it simpler sounds intriguing! I’ll take a look more closely soon.

1 Like

I think I understand.

Within a particular subcommand (e.g. sample vs optimize) the keywords can’t overlap. But that isn’t really a problem in our arguments.

I had considered doing something more substantial, but wanted to wait before doing anything with more effort. Something larger should really be done with a design in mind. This one, I just hacked together.

1 Like

Maybe I should be a little clearer. I didn’t just throw it together with throw-away code. I think it’s clean and thought through well enough for what it is. It just wasn’t something that had an API design up front, software design, then implementation. All three happened organically. There are plenty of different ways to implement the same thing. There are some choices I made that other people would have made differently. That’s why this is a prototype… it’s something I put together (with minimal feedback).

I showed it to @mitzimorris and @RTrangucci a few days ago. But otherwise, it was just a solo effort.

yes, the keyword “file” is ambiguous between data_file and output_file.
otherwise, the keywords are not ambiguous.

here’s a nice comparison of the library used to handle the command line arguments with the boost::program_options library: Comparing CLI11 and Boost PO -

3 Likes

I like this a lot!

I would really like to see this in cmdstan, if possible, not as a separate repo. We will very likely have to bump major version for the incoming language changes in the not-so-distant future anyways. And there is now a substantial amount of deprecated stuff, etc. Maybe we can do this together with that? Another suggestion would be to add this to the cmdstan3 design doc (https://github.com/stan-dev/design-docs/pull/15)?

We could make a translator of the old commands to the new ones fairly easy.

3 Likes

Awesome! If there’s enough demand, we could do that.

If we’re going to bump versions, I think there’s actually some substantial change we can make to make the arguments make more sense. Some things that I didn’t do in order to be compatible with CmdStan:

  • rename nun_samples to num_draws or even draws
  • change data_file to something like input_file or something similar so the short name is -i to match the output of -o (like other programs)
  • move fixed param out of sample. It’s not really sampling. It’s closer to generate quantities in some sense
  • add much more to diagnostics. @rok_cesnovar, benchmarking?
  • properly handle logging levels
  • different input and output formats

There’s a lot more that could be done.

This actually doesn’t address those design points. I thought about it and didn’t go that direction even though I wanted to. To do it properly from c++, we need to runtime load libraries, which there isn’t cross platform compatible. So… a bunch of headache there. We could do this using something like a python or bash script… but that’s something different too and wouldn’t have fixed the command arguments.

3 Likes

@jonah and @mitzimorris have done a bunch of work on argument names. The link to a google sheet should be somwhere in the cmdstan3 design doc somewhere. I think some of your suggestions are already in there.

I think we have been circling around this long enough we have come to the point we should just decide on it and do it. Doing it for the next release cycle (October) is too soon, but the release after that is probably doable. But we need to set up a plan, prepare a list of things we need to deprecate, etc

And yes, a bump should have a bunch more stuff. We do not want to bump major versions just so we can remove deprecated stuff.

Is there --help ?

Answer: of course there was

1 Like

I could also add a --help-all and I did at some point. But didn’t find it too useful. Too much going on.

Yes, all the logistics of making a non-breaking change.

I’m sort of leaning towards calling it something else so the logistics are a little easier. Cause now that it exists, I can already make things better like changing logging levels.

If we’re looking at a 6 month timeframe, it also makes sense to incubate it as a different project until it’s set. Meaning… CmdStan is mature and not going to change much. This prototype could change between now and when it gets a proper release. And that’s fine and expected. Hard to be collaborative on a single branch. (GitHub isn’t really designed around that.)

What if we work on a fork of Cmdstan on someone’s account and slowly build stuff there with CI and everything. I think that would be fine in terms of collaboration.

I am a bit worried that we will confuse users with another name and another interface. We already have a confusing structure of repos in my opinion and users have a hard time figuring out where to report errors or post requests.

But you if want this to be a new project then I am not going to oppose, given that you did all the work.

Re: logging levels

I may be wrong but wouldnt that mostly be changes to the services (stan repo).

In terms of whether this should be CmdStan or something new: is there any reason to have both? My initial thought is that nobody really benefits from having both (neither users nor developers), so we should just go with the better one. If this new one is preferred then that means we either call it CmdStan or we call it something else and retire the name CmdStan.
Anyway, that’s my initial inclination, but I could certainly be overlooking reasons to maintain both. Does anyone think we’d be better off maintaining both separately?

If API and parts are going to change much, then something similar as in pystan-next could be possible, but if most of the code is same, I’m not sure if there is a better way to do this than a new branch?

1 Like

what all is deprecated? the spreadsheet on argument names is here: CmdStan sampler arguments - Google Sheets

there are a few arguments which are candidates for deprecation - are there other items?

agreed - this should be easy - are you thinking of something like a quick pre-scan of the args?

I just revisited that document and the discussion. as Daniel said, we’d need to figure out how to link in the compiled models to make the proposed syntax work. the ambition level of design doc proposal falls between the amount of workflow management provided by the R and Python interfaces and the current CmdStan interface, and maybe that’s not really that useful - note that under that proposal CmdStan still only runs a single chain - it assumes that the user is comfortable working in a *nix or DOS terminal window and is also OK writing simple shell scripts in order to run multiple chains and manage the output files accordingly - (cf 4 MCMC Sampling | CmdStan User’s Guide).

I’m all for this new way of handling arguments - a sub-argument “method” followed by any number of arguments in any order.

1 Like

There are a bunch of functions to be deprecated, the ← assignment operator, # comment character,bin/print and so on. I was refering to that.

1 Like