As part of refactoring PyStan in preparation for the “Stan 3” interface API, I’m splitting PyStan up into a “frontend” and a “backend”. The backend is just an extremely thin interface which allows calling stan::services functions with a model. The backend speaks HTTP and it is a separate package so I’m calling it httpstan.
What’s the overhead for sending things over HTTP versus calling in memory?
I looked at the httpstan doc (thanks!) and I wouldn’t encourage people to code Stan programs as strings and then remove all the newlines—this will destroy all of our line-based reporting of errors.
I should’ve added that I think having an HTTP Stan will be great. I’m just not convinced it’s the right thing to build under PyStan rather than on top of PyStan.
Oh, don’t worry about that. The newlines can be encoded, it’s just a
little tricky to get them into JSON on the command line (with bash).
You know, I worried about the memory / speed issues and thought we would
have to use protocol buffers, but now I think HTTP+JSON will be fast
enough. One thing that changed my mind was seeing that this new
high-performance editor call xi, https://github.com/google/xi-editor,
which has frontend/backend split, uses JSON to communicate between its
processes. So encoding things to and from JSON can’t be too bad.
Text editors may not be the best comparison—Stan will dump out draws way faster than a person will type into an editor.
The performance hit is going to depend on how long each iteration takes to run and how many parameters there are per iteration. Simply converting numbers to ASCII strings and vice versa is pretty expensive (dozens of arithmetic operations expensive). Writing to file has huge latency, but throughput can probably keep up with the ASCII conversion after iterations, or at least that’s what we found in CmdStan.
Where there’s a huge hit is reading data—CmdStan’s input is very slow compared to reading binaries when you get near 100MB or so of data. It can take CmdStan longer to read a file of data than to fit it with optimization.
Is there a way in PyStan to just stream data out without saving it in memory?
backend speaks HTTP and it is a separate package so I’m calling it httpstan
does this mean that in order to run pystan, there’s a local http server running? this can be very an extremely problematic install, or even something that’s not allowed.
while having httpstan is a good thing, I don’t think it should be the backend for pystan.
Hey,
I think this is a great idea because it will let us create installers that can package their own C++ toolchain for users who don’t want to mess around with getting a system-wide one installed appropriately (and would let us not worry about supporting a variety of different C++ compilers eventually).
I have a couple of questions about the design:
Are you planning to send the data over the wire as well?
Is just straight-up adopting jupyter’s ZeroMQ design that much harder? It seems like it would be a lot more performant.
Will the user need to know about this backend server or will it be transparent to them? Or I suppose the first step is a stand-alone httpstan that someone could then build libraries like PyStan 3.0 around that import it and automatically start httpstan under the hood?
Does the API support streaming data and samples? This can be a bit tricky over HTTP but I think is important for future algorithms that don’t require everything loaded into memory (like SGD). Here’s a link with some more info about it in HTTP (ZeroMQ might actually make this easier): https://gist.github.com/CMCDragonkai/6bfade6431e9ffb7fe88
Samples are sent over HTTP too – is that what you mean?
I thought about ZeroMQ. I do think things have changed since jupyter made its design decisions. Streaming using HTTP and handling streaming HTTP in Python is far easier today thanks to new features in Python 3. HTTP2, which httpstan could eventually use, is likely competitive with ZMQ in all the use cases we are focused on.
Exactly. The user does not need to know anything about the backend server. PyStan will start httpstan under the hood.
Yup. It streams samples right now using HTTP 1.1 using the chunked encoding strategy. Things might be even better with HTTP 2.