Voting on inclusion of posteriordb

For more significant changes like including new projects into Stan I think the RFC approach would be really valuable. In my opinion it’s been exceedingly successful for the more recent math projects, not only in clarifying design but also making the goals and scope explicit which in my opinion is the more significant problem here.

There is for example my last comment, Promoting posteriordb into an official Stan project - #38 by betanalpha.

In my opinion the challenge here is that some of the key technical points cannot be divorced from whether posteriordb is an appropriate Stan package.

The goals directly related to Stan, such as performance testing and validity testing require a very rigid database schema. This is based purely on the mathematics of Bayesian computation and the core functionality of Stan – users specify models and Stan tries to estimate expectations values as accurately as possible. Exactly how benchmarks are integrated into performance and validity testing, especially in the core library and across interfaces, is very much an open question that can be discussed independently of the vote. Same with the exact details of how fits are validated and how validation information is saved. But again the relevant user-facing schema – model, function, validated expectation value – is fully specified by the nature of Bayesian computation. I would hope that this isn’t controversial.

My problem with the proposal, and hence the appropriateness of a vote, is not that the proposed scope of posteriordb exceeded to this scope but rather that the proposed scope was not precisely defined, referring to unspecified goals of other projects and other speculative applications. If the full scope of the project were precisely defined then we could have a well-defined circumstance for a vote. Instead we’re voting on a whether or not posteriordb seems like a relevant project which to me is far too speculative. In fact in his last post @avehtari noted changes to the proposed scope that were motived not due to internal discussion but rather external discussion!

Perhaps most importantly what exactly is the utility of voting before the scope has been precisely defined? All a premature vote does is give the project a commitment based on an assumption is that the scope will eventually converge to something appropriate to the project. Because being a part of Stan will not bestow additional resources to the developers to which they don’t already have access, there’s no cost to waiting for posteriordb to mature to the point where it’s scope is precise and can be evaluated. If anything including the project prematurely only introduces the risk that the posteriordb developers will converge to a scope that conflicts with the goals of Stan and we have to deal with resolving those conflicts.