"ServerStan" implementation language poll

We’ve talked a few times about implementing a “ServerStan” backend similar to @ariddell’s HTTPStan, likely attempting to use the same API and format if performance concerns allow. I’m curious - if you would contribute, what language would you prefer it be written in? Definitely open to other suggestions not on here provided the compile to statically linked binaries; please comment and I can add them. [edit: @ariddell also points out that we need to be able to build Stan models as shared libraries at runtime, and it would be nice if we didn’t have to outsource that to make].

  • C++
  • OCaml
  • Go
  • Rust
  • Other

0 voters

For more context, here’s an old quote about it:

1 Like

My $0.02 reasoning here: we want to keep the overall language complexity low, and we already have C++ and OCaml so those seem fine. I think Go is an extremely simple language that doesn’t add a lot of cognitive burden - I felt very comfortable in it after about a week (compared with C++ and OCaml, which have corners I still avoid).

There’s really important context missing here. The biggest challenge facing serverstan is building the model-specific Stan binary or shared library. The reason httpstan uses Python is because Python’s distutils/setuptools library handles this without any difficulty. (By contrast, CmdStanPy / CmdStan uses GNU make.)

I don’t know if Rust, Go, or OCaml can improve on distutils/setuptools here. I’d like to make sure of this before voicing any opinion.


That’s a great point. I think it would be maybe okay but suboptimal to just outsource this to make. Do you think we just shouldn’t consider make an option at all?

On a related note, how does Python link against the newly built library while the interpreter is running? That part always seemed magical to me, but I assume we could figure that out in any of these languages (does that seem right to you?).

re: make. I can’t agree to work on any new project that uses GNU make.
I’ve lost too many weeks of my life to it as it is. Debugging problems
in Makefiles is virtually impossible. I completely understand why very
few large projects use it today.

As for the Python C++ interface, you write C++ code with some hooks for
Python. There are C++ libraries which make this very easy, such as
boost.python and pybind11. pybind11 seems like what most people are
using for new projects.

1 Like

I agree that probably new projects shouldn’t use make, though in this case I wasn’t talking about us creating any makefiles, but rather just using the CmdStan ones the way CmdStanPy does now.

Here’s what looks like the C++ or C integration tools for those languages:
OCaml: Just C? ctypes super alpha C++ binding generator
Rust: bindgen?
Go: Just C? You just include a few special comments?

I’m finding fewer descriptions of how to actually cause C++ to be built at runtime; these languages seem to outsource that to their respective build tools.

For those who checked “Other” what language did you have in mind?

I just learned two things:

  1. You cannot edit a poll after it has been created in any way(??)
  2. The automatically chosen poll close time is +24h 😅

Suggestion to consider: write it in Julia using JuliaStan.jl and HTTP.jl if you absolutely must have a prebuilt thing, use PackageCompiler.jl for efficient binary data transfer consider BSON

How is JuliaStan binding against the Stan C++?

Julia might not be a bad choice.

whoops it’s just called Stan.jl not JuliaStan.jl

It looks like it runs cmdstan so it relies on the whole cmdstan makefile build process… So perhaps that doesn’t appeal.

I think it is more of a other things that matter

  • REST/some interface + robustness
  • C++ compilation + binding
  • precompilation --> binary (+default compiler?)

It seems because Stan relies on C++ you’re never going to get away from the need for a C++ compiler and some kind of build-system for the resulting stuff. I don’t think choice of language to build the server in makes much difference at all here.

Julia has the infrastructure to provide the REST / HTTP interface, and has infrastructure to distribute computation over clusters of nodes, which I assume would be valuable. Also it has existing tools to call CmdStan.

Another option that has some appeal is erlang, which is really designed to sit there and run forever distributing computation across multiple places. However it has no interface to Stan that I know of. Still, building an interface to CmdStan should be pretty easy.

I agree with this sentiment. The language the HTTP/gRPC/whatever server is written is
is almost inconsequential. It’s the process for compiling code generated
by stanc at runtime that matters.

Python is reasonably good at compiling at runtime, as httpstan demonstrates.

Proof of concepts strike me as more valuable than theoretical
deliberations about the language. There really is almost no code to
write. You’re just compiling stuff and then calling stan::services
functions through whatever foreign function interface your language
supports. You send the results back over TCP/UDP.

Edited: Clarifying edits, no need for things to be done via HTTP

I’ve been working in dynamic languages so long that I don’t even know what people use instead of make these days… what’s the current thing people do with statically compiled languages?

I don’t read the forums frequently enough to make a 2d deadline!


I have some questions about scope:

  1. Will it implement log density and gradients efficiently?
  2. Will it handle multiple different users/models concurrently?
  3. Will it manage any kind of persistence?
  4. Are there performance targets w.r.t. CmdStan or PyStan or RStan?

If the answer to (1) is positive, it rules out CmdStan-based solutions. If the answer to (1) is negative, I don’t see the point of having a server vs. just calling CmdStan (or better yet, a suitable replacement) directly.

If the answer to (2) or (3) is positive, that’s a lot of state to manage. (3) is related to (2). From what I can understand of the current httpstan interface, it persists model IDs.

Python / HTTPStan

It looks like the calls to services just wait to return until the call has completed.

Is the protobuf format documented somewhere? The web service doc only says it returns a binary and doesn’t link to the format.

It looks like this persists model IDs and also IDs for output. How long do those persist?

How is scheduling handled for concurrent requests?


Every language has the infrastructure to provide HTTP or direct socket protocols. How does Julia manage mulitple nodes on a cluster?

Stan itself has the ability to distribute computation of the log density and gradient over clusters of nodes for within-chain parallelism using MPI (multiple nodes or cores) or mulit-threading (multiple cores).

@dlakelan: Are you offering to help build this or just providing suggestions? I’m asking not to discourage suggestions, but because a large part of this decision has to be based on having both the expertise and dedication on the dev team that builds it.

I’ve been working in dynamic languages so long that I don’t even know what people use instead of make these days

Large projects require more automation and dependency management than a single dynamic language like Julia can provide. This is especially true when you have to link in external resources like Fortran or C++ libraries, documentation generators, testing frameworks, web services, etc. It may be that Julia has its own set of make-like functionality for this, but something needs to figure out even for the purely dynamic part of the language how to rebuild doc when components change. Java built a lot of this into their compiler and their javadoc tooling. But even there, every project I was ever associated with used Ant to organize its builds.


I’m sort of surprised nobody’s mentioned Java. Isn’t that still popular for building this kind of thing? I haven’t used it to build a web site in over 10 years, so I’m not seriously suggesting it.


Just providing suggestions. At first I was thinking the high performance and native distributed nature of Julia combined with the existing Stan interface would be an obvious win… But now that I realize that Julia basically calls CmdStan it doesn’t seem to be very important what language is chosen.

as for Julia and it’s distributed stuff: https://docs.julialang.org/en/v1/stdlib/Distributed/

also, I disagree with:

But it would take quite a bit of reading to understand the Julia ecosystem and why it is fully capable, and that reading might not be worth your time.

EDIT: additional context:

In my imagination the point of a ServerStan is to let multiple people submit jobs to a cluster and each one gets some time to get their computing done… which means rather than serialize the jobs and have Stan use all the cores, it might make sense to parallelize the jobs to different machines and have Stan just use the cores on that machine, but maybe I’m missing the point of the whole thing.

Is there work on building a native matrix library in Julia? That’d go a long way to selling its performance.

But my point wasn’t about Julia’s capabilities per se. If you work on a large project, you’ll have to interface with someone’s existing databases, someone’s existing continuous integration system, their existing web services, and their existing Fortran or Python interface to some scientific code that’s been cooked up over several years. Then if you need that to run cross-platform, it’s even messier on the build front. If you need to maintain backward compatibility with old versions, it’s even harder. At this point, you need something to manage all the moving parts in the application. C++ builds are easy the way we’re doing them—that’s not the bottleneck. It’s making sure that all the modules from GitHub are up to date, that all the doc gets built, that we can install the right libraries for Windows and Mac and Linux.

I may be, too. I don’t know if there’s a spec somewhere other than the existing HTTPStan.

But definitely parallelization over multiple boxes (not just cores) would be important if you wanted to set up a server for a lab or for the general public or something.

That’s been around for years. Julia is designed primarily as a high performance computing language that eliminates the “two language problem”. So all the performant code is written in julia.


For example the Flux ML library: https://fluxml.ai/Flux.jl/stable/

And it supports GPU computation: https://juliagpu.org/

By far the most featureful and performant ODE library available in the world (as far as I know) is https://diffeq.sciml.ai/stable/

The fastest CSV parser in the world is written in Julia, and auto-scales to multi-threading

Wrapping C/C++/Fortran code is rather trivial (and in the 1.5 version I think the syntax got even easier)


Creating function pointers so that C code can call your Julia code is also easy.

Julia’s package manager has the single most advanced system for tracking dependencies that I’ve seen in any language bar none, with the Project.toml files and the ability to set up completely independent environments in which to run your code (avoiding dependency hell and letting you use incompatible versions of libraries in different projects without conflict at all).

the “build” in Julia basically doesn’t exist… it’s a dynamic compiled language, so when you call a function, that function gets compiled the first time you call it with those arguments. There’s no big “compilation step” where everything gets compiled to an executable (though with PackageCompiler.jl you can pre-compile things so that they’re loaded right at startup already).

If you were starting on the Stan project today, there would be absolutely no reason to use C++ at all.

In fact, by utilizing the macro programming facilities in Julia, a Stan-like package exists: Turing.jl


Which uses a domain specific language written as a julia macro to compile to julia native code and then run NUTS, utilizing the variety of autodiff libraries already written in Julia.

It’s quite impressive.

1 Like

My understanding was that it still uses BLAS and LAPACK on the back end. Here’s what the doc you linked says:

n Julia (as in much of scientific computation), dense linear-algebra operations are based on the LAPACK library, which in turn is built on top of basic linear-algebra building-blocks known as the BLAS.

The links are to the Fortran packages.

CSV Reader Benchmarks: Julia Reads CSVs 10-20x Faster than Python and R

Beating R or Python’s libs hardly qualifies for fastest in the world :-)

There’s no big “compilation step” where everything gets compiled to an executable (though with PackageCompiler.jl you can pre-compile things so that they’re loaded right at startup already).

I think we may be talking at cross-purposes here. I understand how all this works at the compiler level.

What I’m talking about is managing the result of linking external libraries into one big system. That’s what needs to get managed. Usually the build in any one language in these things is pretty straightforward.

by utilizing the macro programming facilities in Julia, a Stan-like package exists: Turing.jl

Yup. One of Gharamani’s many PPLs :-)

The way Julia integrated autodiff is neat—it reminds me of how R deals with “object oriented” coding.

Some of that’s just wrapping Fortran and C++ on the back end according to the doc. How many of them are written natively? And which ones are “most performant”?

There are certainly a lot of solvers and functionality for things like differential algebraic equations in Julia’s diff eq package. I don’t know other packages trying to provide one-size-fits-all API solutions to diff eqs other than deSolve in R, but then I haven’t looked.

It looks like a lot of them have built-in sensitivity analysis and the ones coded in Julia that don’t have built-in sensitivity let you autodiff the algorithm (which is what we initially did in Stan becaue it’s easy; but it doesn’t let you control error in derivatives and is hugely memory intensive).

If you were starting on the Stan project today, there would be absolutely no reason to use C++ at all.

We probably wouldn’t start on the Stan project today at all. We’d probably just use PyMC3. Or maybe even the Julia lib if someone got around to learning the language. If we did have to start today, my first inclination would be to use JAX or maybe PyTorch.

1 Like

Yes, I think there was no reason to reinvent that wheel. Can Julia create performant raw code? Absolutely. Just one example of comparison: https://julialang.org/benchmarks/

Here’s my trivially simple matrix-vector multiply example using the tips you find in the manual about speed:

Does it make sense to reinvent BLAS? no, so they didn’t. Can you get within a factor of 2 of BLAS using straight up non-optimized code like a first year undergrad would write, plus a couple performance tips from the manual? Yes.

The real advantage to working in Julia though is that it creates a system whereby things can be specialized so that the right high performance code can be called without complications (through the strong dynamic typing). So for example you can do matrix factorizations and solves and it will know that it’s using a factorized object, and do the solve the right (fast) way for example.

Actually both R and python have hand tuned C code doing this (as everything performance sensitive in R or Python), and a lot of attention has gone into it because many many people use it and sometimes with very large files (like gigabytes of census data etc), so beating those using straightforward Julia is in fact very impressive. Beating them by a factor of 10 by automatically adding many threads on a many core machine is pretty darn impressive.

In Julia, libraries are handled by the Pkg infrastructure, and that has a whole dependency management system built-in to it, with the “separate environments” I mentioned. There are methods for packages to include pre-built “artifacts” (c libraries etc) as well. So that’s very good, outstanding even.

Generally you wouldn’t choose Julia to provide just one “library” or something, if you want to static compile things and then link them into C or C++ or OCaml then you’d probably not choose Julia. On the other hand, you might well choose Julia to run the show, and then link in C/C++/R etc as needed. If you want to run a C/C++ code using Julia to create dynamic functions on the fly for example, you’d probably run Julia, which would load the C/C++ and then call the C/C++ main() I guess.

1 Like