Interface roadmap - last draft before ratification vote

ahartikainen · September 24, 2019, 5:47pm

fit.iteration(i) with possible kw (order= 'random' / 'default', n=-1 / 100 / 20, etc) will be much more flexible in python than fit[i] (and then we should have some idea for slicing --> fit[i-100:i] that would be sensible and still follow similar logic as fit[i])

bgoodri · September 24, 2019, 5:55pm

What would fit[i-100:i] do?

ahartikainen · September 24, 2019, 5:57pm

Not sure. If we go with fit[] should slicing raise an exception.

ariddell · September 24, 2019, 6:01pm

In cases like this, where a feature is desired by a small number of (important) persons, I think we should at least entertain the possibility of maintaining a fork.

bgoodri · September 24, 2019, 6:02pm

I would say that attempting to slice fit should raise an exception until we can think of a legitimate use for it.

bgoodri · September 24, 2019, 6:03pm

I wouldn’t go that far in this case. Implementing fit[i] or fit$iteration(i) is not that big a deal.

ariddell · September 24, 2019, 6:05pm

I like fit$iteration(i). Perhaps there’s some useful prior art in the way pandas handles thing? There’s df.iloc[i] and df.loc[param_name] – which do different things while avoiding overloading what one can do with simple slices.

Bob_Carpenter · September 25, 2019, 2:27pm

Only that linear algebra distinguishes them. Technically, a column vector is an M x 1 matrix, whereas a row vector is a 1 x N matrix. Shape matters for linear algebra operations.

The way I initially set up the Stan types followed MATLAB, not R. That let us define row_vector * col_vector to return a scalar and col_vector * row_vector to return a matrix. row_vector * matrix and matrix * col_vector are legal, but not the other way around.

I don’t think there are row vector or column vector types in R, so to make the distinction, they’d have to be matrices. But then they won’t work with our input, which allows an array for any of vector, column vector, or 1D array.

Bob_Carpenter · September 25, 2019, 2:36pm

+1 to that. I don’t feel that the interface components of this document go very far in satisfying its stated goals:

First, deciding on names for arguments doesn’t address any of this. The big thing there would be trying to figure out how to stage it all. I’ve asked before about the plan for dealing with existing interfaces is—will the existing arguments go away or exist side-by-side with the new ones?

For (1), it’s going to look very narrow as it was developed as just what @seantalts took to already be a “rough consensus”. That’s why I think the level of granularity is so uneven through the document (ranging from specific argument names, through dictates like “CmdStan is for users”, to huge vaguely-specified projects like building a Stan server). Lest you think that last comment was a criticism, I’d rather see the roadmap concentrate on vaguely-specifed large projects than on micro-level details like argument names.

For (2), none of these interface issues seem big enough picture items to attract funding. Maybe building a server, but the problem is that won’t add new user facing functionality, but just change the way Stan’s installed.

For (3), the only help I see being solicited on interfaces is for someone to take over the server project.

Continuing with (3), suppose someone comes along after ratification and submits a pull request that changes the names and defaults in the stan() and sampling() functions in R. Would that be accepted and merged or is there a timeline here? What if it left all the existing names in place with lots of doc on which subsets can be used together? I’m having a hard time seeing how this roadmap can be turned into actionable pull requests.

Going on, it’s hard to see how the functions and methods recommendations could be implemented as the granularity’s very high and from the rest of the roadmap, it’s clear people are very picky about naming conventions and functionality.

As to specifics, I objected at the interface meeting to having a model object to which we add data. It feels like a type error to me as a model defines a posterior density as a function of data, whereas once we add data, it’s no longer a model, but an instantiated posterior function. What do the functions like run_hmc and maximize and generate_quantities return? Is it a stanfit object in all three cases?

This goes along with my other objections to this API that I don’t like packing all these disparate pieces of information into mutable objects like this. Once we’ve run model$add_data, can we run it again? What does run_hmc mean? Is that all variants we currently have of NUTS and static HMC? What happens if we call run_hmc twice in succession on the same model object?

Would you mind taking the non-interface parts out of it then? That’s headings: model class augmentations, faster compiler times, template parameter defaults, Stan language features, Stan in Stan, Other compiler deliverables, Stan parallelism, miscellaneous ideas, new compiler integration (that may be related to interfaces), future roadmap topics, and order of operations.

jonah · September 25, 2019, 7:00pm

I think it’s becoming increasingly clear that we do need to make some basic prototypes available so everyone can actually see and try what is being proposed for the new interface designs.

How close is what you have to usable? Judging from this thread and others I think it’s not just this one feature that people are confused about and disagreeing about. It would be good to have a demo.

syclik · September 28, 2019, 11:06am

@Bob_Carpenter: thank you. What you said articulates what I’m thinking.

@seantalts, I took time thinking about the roadmap and seeing if I could draft it so I’d be happy. Unfortunately, I couldn’t do that well. I do have a concrete suggestion. If you’re open to starting a different doc:

set a clear goal first. Since this is the first time this document is being created, is the goal to establish the doc? Status of the past year?
ask the domains to each produce something to go into the doc. Maybe just ask for the big picture items: what is being done to improve the domain for users in the next 6 months? Doing it this way allows the people involved in the domain a chance to respond and come together to put that together.

I’m glad you got people together to meet. I’m sure there was a lot of progress made and I wish I had been there. But I don’t think that the people at that meeting spoke for the whole technical development of Stan to the point where new items that haven’t been discussed in a domain could be decided. So switching that paradigm and getting the domains to figure out how to do that seems better to do. And it can be done online.

seantalts · October 2, 2019, 11:19pm

Yeah, it has a varying level of granularity - I think that’s okay and a necessary outcome of the style of consensus-driven design that we’re insisting on in this project. I can sort it to be in terms of big to small picture if that would help?

Yeah, let’s talk about this. @bgoodri and @ariddell, do you have preferences on when to do the switchover?

I don’t think I heard much from @bgoodri or @ariddell about this at the meeting. Do you have thoughts on changing this to add another class that represents a model with data? I am partial to that version as well.

I’d rather change the title: Roadmap Part 1! I’ll update to reflect that - I don’t want to lose track of the valuable information we have garnered here from the tech leads and the community.

I’d be happy to start another doc when this is done and attempt Roadmap Part 2 in something like that fashion. Would you mind starting another thread to talk about that? I’d like to get some other opinions on how that would best be accomplished in there but I don’t want to hijack this thread.

ariddell · October 4, 2019, 1:10pm

There’s a consensus that the names and defaults should be consistent. So if someone wanted to change something in RStan it would need to be changed in CmdStan and everything else as well at the same time.

PyStan will switch over with PyStan version 3. There’s an alpha version of PyStan 3 with the new names already.

ariddell · October 5, 2019, 5:38pm

I think tackling the roadmap piece by piece makes sense. There are some items on which there is total consensus. There are other items where I think “rough consensus” might be a stretch.

Also, I agree with @Bob_Carpenter about the need for the roadmap to not be terribly granular. It shouldn’t describe implementation details except in the most general terms.

seantalts · October 5, 2019, 5:49pm

Could one of you highlight specifics there for what you think is best pulled out separately? I’m going to try to wrap this up next week and send it out to the full electorate for a majority vote.

I’m also happy to move the specific names to an appendix if that satisfies the granularity desires from you and Bob.

ariddell · October 5, 2019, 6:30pm

Sure, how about a vote where we are asked to approve or disapprove or abstain on each of the following items (with appropriate explanatory text)?

Cross-interface Standardization
Unifying names
Removing some parameters from all interfaces
Functions and methods
Returning results
CmdStan
CmdStanPy and CmdStanR
ServerStan (Needs a tech lead! Could be you!)
Stan C++ Services
Model Class Augmentations:Template parameter defaults
Model Class Augmentations: New methods for ease of use
Model Class Augmentations: Parameter metadata
Model Class Augmentations:: Stan API
Stan language features
Stan in Stan

Edit: add “or abstain”

seantalts · October 6, 2019, 12:10am

I think that’s kind of a lot, don’t you? I want to mostly get alignment on a single document but if there is a specific section or two you think is likely to be controversial (but for some reason hasn’t come up for discussion on discourse in the past two months) then we could put that separately. I just don’t see the point of listing it out piecewise and expecting everyone to be an expert in everything. Ideally this discussion is the part where we all agree on the document we submit to be ratified; if there’s something you or others don’t think is good for the project and wouldn’t pass a majority vote we can edit it.

seantalts · October 6, 2019, 12:15am

For further clarification, part of the reason there is to provide significant incentive to help edit it to a point where you and others are happy with the resulting compromise and then force that compromise for the sake of progress, rather than backpedaling into the paradigm of fiefdoms, no compromise, and no progress.

rok_cesnovar · October 6, 2019, 10:04am

If this will be approved, I would like to throw my hat in the ring for ServerStan tech lead. If you feel I have enough of what it takes to do that, off course.

ariddell · October 6, 2019, 12:36pm

I don’t think it would hurt to vote on things in this granularity. It’s
not much of an additional burden. And it would have the great benefit of
including people who weren’t at the meeting. We really would learn where
there is overwhelming consensus vs. rough vs. something else. That seems
worth it to me.

I did take the path of least resistance here and re-use the heading
structure you provided. Some slightly coarser clustering would be OK too
(for me).

Topic		Replies	Views
[Interface roadmap] fit objects and `extract` Developers	44	2362	September 17, 2019
Replacement for permuted=TRUE (RStan 3 / PyStan 3) Developers	38	2349	July 18, 2017
RStan3 and PyStan3 Interface Developers	77	6034	July 16, 2017
Prototype: Simplified CLI to Stan General cmdstan	28	1588	September 17, 2020
My experiment on webstan/cloudstan Developers	16	1279	August 29, 2019

Interface roadmap - last draft before ratification vote

Related topics