I’m planning on finishing work on “PyStan 3” in late May and early June. I’m writing this in order to let people know what I’m planning on doing (the plan, linked below, hasn’t changed) and to confirm there’s still agreement about the PyStan/RStan 3 API sketched out in User Interface Guidelines for Developers.
The following issue tracks the planned changes: PyStan 3 Must-Haves. Feel free to add suggestions there or in replies to this post.
Most of the changes are of interest to developers. The most important change, in my opinion, is the splitting of pystan into “frontend” (pystan) and “backend” (httpstan) packages. (The design is borrowed from Jupyter Notebook.) This will allow developers interested in improving the user experience to get their work done without worrying about what is happening in the world of C++. The new frontend package will be a pure-Python package (i.e., no C++/Cython code). This change will permit rapid development of new features (e.g., better experience in JupyterLab, further pandas integration, diagnostic tools, plotting) and make maintenance much easier.
For users, the biggest changes are (1) the new API and (2) a Python 3.6 requirement. Everything is spelled out in the PyStan 3 Must-Haves issue.
After thinking a bit more about this I think we should seriously consider this kind of split of rstan into a front end that handles the user interface and a back end that does everything C++ related and is called from the front end. I’m not sure we’d do http though (I haven’t thought about it much yet). My hunch is that it’s more realistic that we could do a front/back end split by moving all Rcpp related stuff into StanHeaders or a new package.
As with pystan, this would make it much easier to get outside contributions because the front end would purely R and more limber. Currently contributing to RStan is not super user-friendly.
@bgoodri (and others) this sort of split would also make sense given that we could each primarily oversee one of them and our skills pretty naturally divide along these lines anyway.
Another few benefits:
the front end would have no CRAN issues (although still just as many reverse dependencies).
the front end could be fully tested on Travis with no issue, whereas currently none of it is running on Travis because of timeouts from installation/compilation and other issues.
If you want to test the idea out you could use httpstan as the backend. The pystan frontend communicates with the httpstan backend over HTTP on localhost. So if R can make http requests, it can use the backend in essentially the same way that the pystan frontend does.
hmm… I don’t quite see the benefit of fronted / backend split, honestly. However, if one can make that split such that we can use the same backend for all frontends, then this would be an obvious win. So can httpstan be the backend for a Python/R/… frontend? If yes, then this is a big thing, of course.
You seem to plan to go all-in with threading in PyStan3. Just note:
we just got it to work and have made the non-threading stuff the default
there is a ~10% performance hit as it looks like (possibly even up to 15%-20%). It’s still to early to say how much performance is paid, but it won’t be zero (although we may gain some performance by not having to load all data multiple times and as such have better use of CPU caches).
the requirements in terms of minimal compiler / OS versions is for sure higher. Ubuntu 14.04 LTS would require odd fixes to include to make these work, for example (and we have dropped these fixes).
On the other hand: If you go all in with PyStan3 wrt to threading we will quickly learn if there are any more kinks out there.
I think I mentioned like 5 benefits ;) Primarily I think development and maintenance would be easier. Right now RStan releases are infrequent. It would be easier to do more frequent minor releases with improvements to the user interface if they could be submitted to CRAN on their own. I think that’s a pretty big deal.
One of the worst mistakes we made was splitting StanHeaders off into a separate package. At the time, we deluded ourselves into thinking that there would be a bunch of R packages that would just use the Stan Math library and not need rstan, but there has only been one such package in three years, which is not nearly worth the difficulty of making synchronized releases of rstan and StanHeaders and dealing with the fact that the binaries get built on different schedules. Nor has it been worth the cost of making a distinction between “Stan” that is without difference to R users.
Moreover, at the time, StanHeaders was at least header only but now it isn’t (due to CVODES), so we have to build the shared object and rstan has to put StanHeaders into Depends in order to link models (even if they don’t use CVODES) to the dynamic shared library in R’s memory. So, we are left with a StanHeaders package that is basically useless (or at least unused, apart from OpenMx) without rstan, and it was 99% self-inflicted.
Travis doesn’t fail for rstan because of timeouts; it fails because of inconsistencies between rstan and StanHeaders. The rstanarm package is a different story (it does fail on Travis due to timeouts) but that would not be affected at all by splitting up rstan further.
It is true that it is difficult for developers to contribute just to the R part of rstan because they need a C++ toolchain to compile rstan. I don’t think that is much of an expected loss. I can’t remember anyone saying they wanted to contribute something but couldn’t due to the C++. In order to contribute something useful to rstan, you basically have to have some experience using Stan, in which case you have the C++ toolchain, and if you don’t touch the C++ files, then you don’t have to recompile it very much.
A split such that packages such as rstanarm do not need anything from rstan might make sense, but I don’t see how that is going to work. As always with the question of splitting, it comes down to: What would utilize the front-end but not the back-end and what would utilize the back-end but not the front-end?
Good points. But if we can possibly do this by moving some of rstan into StanHeaders rather than having a third package in addition to the two, then I don’t see the downside. StanHeaders already exists after all.
For RStan 3 or whatever it ends up being, couldn’t we even have a lot more in StanHeaders? Couldn’t almost anything Rcpp related be in StanHeaders and then rstan can just be the way humans interface with StanHeaders? Or am I overlooking something that prevents this?
Sounds to me as if the analysis API which was brought up a while ago is maybe what we want? The httpstan thing is great for docker images, but to streamline the interfaces the analysis API sounds like a win.
… sorry for having hijacked PyStan3 to some extent…
Yeah, I think the ultimate goal here should be having a single server that all of the interfaces talk to that doesn’t need to be compiled with the same compiler R or Python were compiled with(!!!). I haven’t kept up with why HTTP would be a good choice of application protocol and hope we aren’t talking about trying to get people to install Docker as part of the Stan install… :P
Ideally, with a binary installer. It would limit some functionality in RStan and PyStan, but again, I don’t think much of that is widely used other than perhaps exposing Stan functions in R (which is certainly useful for debugging).
It probably couldn’t use Rccp, but in my opinion Stan functions could be compiled (if there is some method to wrap them) and the backend could give an easy interface to interact with them (data in, results out).