Chainsail, the web service for sampling multimodal probability distributions, is now open-source!

In August '22, I presented Chainsail, an internal project my colleagues and me over at Tweag have been working on for quite a while. It is a cloud-based web service that implements Parallel Tempering / Replica Exchange to help with sampling multimodal probability distributions. You can learn more about Chainsail in our announcement blog post.

Back then, we invited users to participate in a beta test, but it was a closed-source project. Today we’re happy to announce that Chainsail is now open source!

The Chainsail development team encourages all kinds of contributions to the Chainsail code base and is looking forward to see reuses of parts of the Chainsail code for other projects. A blog post on the occasion of the open-source release outlines the service architecture, points to relevant parts of the source code and proposes a couple of future extensions we at Tweag would, together with interested community members, enjoy working on.

The Chainsail source code is available in the following GitHub repository: https://github.com/tweag/chainsail

Don’t hesitate to hit us with any questions or comments about the project, either in this thread or via a GitHub issue!

4 Likes

This is great. Any experience on comparing the efficiency of many short chains vs a smaller amount of long chains? Does the MPI implementation support hybrid parallelization with thread-based within-chain parallel runs?

1 Like

Glad to hear you find it interesting!

No, we didn’t do a lot of benchmarking so far. Replica Exchange has been around for a while and other than the automated temperature schedule tuning, Chainsail doesn’t really add new algorithmic advances. It does, though, look to me as if Replica Exchange isn’t widely used / known in probabilistic programming and thus existing benchmarks might be unknown or investigating (for the probabilistic programming crowd) irrelevant use cases. So I’d be happy to run a benchmark Stan model if you have one at hand! Someone recently suggested we’d try an egg box distribution, which (I guess) would be a mixture of a “medium” (say, 20 or 30) 2D Gaussians. That’s something we could easily do.

Coincidentally, that was rather a bug than a feature when I used the MPI Replica Exchange implementation at the core of Chainsail on HPC clusters a long time ago. numpy would multithread automatically and the resource scheduler / cluster admins were not too happy about that… So in principle, you could get something like this work, yes. In how far that’s a good idea with the current Kubernetes-based Chainsail-internal compute clusters, I cannot tell. But if you run the core “controller” component on a single, beefy machine, you could definitely use multithreading without any issues. But so far, in terms of Stan support, all that Chainsail does is reach out to a httpstan server, so I guess it currently doesn’t support within-chain parallel runs out-of-the-box.

2 Likes