Chainsail, a web service for sampling multimodal distributions: opinions and beta testers wanted!

Hi Stan community,

Colleagues and I over at Tweag created a web service called Chainsail that can drastically improve sampling of multimodal distributions, which occur often in models with unidentifiable parameters or when you have ambiguous data. Chainsail has flexible support for models and probability distributions defined in PyMC, Stan, or hand-written Python.
The secret sauce in Chainsail is an autotuning Replica Exchange algorithm that uses cloud computing to scale dynamically beyond the computing resources available on single machines. If you want to learn more about Replica Exchange, here’s a shameless plug: I wrote a blog post about it.

We’re currently looking for probabilistic programming practitioners who might be interested in beta-testing it. If you have multimodal distributions to sample, shoot us an email to get your email address authorized. The email address is below. Don’t hesitate either to tell us a bit about your sampling problem or ask us for a demo - we would be happy to chat and show you around. Currently, Chainsail is deployed on Tweag’s premises and is available to beta testers for free (within reasonable computing time limits).

Future Chainsail development depends mostly on beta tester feedback, but faster Stan support, a better HMC implementation (possibly using BlackJAX) and more choices for the tempering schemes (for example, applying a temperature only to the likelihood) would be among the next things to work on.
Chainsail is currently closed-source, but it is highly likely that we will eventually make at least parts of the service, if not all of it, open-source.
If you’d like to learn more about Chainsail, here’s a couple of additional resources:

  • we shot a ~15 minutes walkthrough video that demonstrates how to use Chainsail,
  • we wrote an announcement blog post that presents Chainsail and the kind of problems it solves,
  • we have a repository that provides more detailed documentation and example probability distributions. It also hosts the chainsail-helpers Python package that provides probability distribution interfaces for PyMC and Stan and a helper script to process the downloaded sampling results: GitHub - tweag/chainsail-resources: Examples, documentation and other additional resources related to Chainsail,
  • and, in a couple of days, we will publish another blog post in which we use Chainsail to analyze a soft k-means model / fitting a Gaussian mixture with unequal weights (in one dimension for easy visualization) and show how using Replica Exchange via Chainsail makes a difference.

Chainsail is in an early stage and currently has a couple major limitations:

  • the big limitation right off: sampling a model defined in Stan works, but is very, very, very slow,
  • it uses only a very basic, untuned HMC implementation (no NUTS, fixed and preset number of integration steps, simple heuristic to tune the timestep),
  • only the most important parameters can currently be set by the user,
  • it’s in its infancy, so expect glitches and rough edges :-)

If you have any questions about Replica Exchange, what the Chainsail service can and cannot do and if you’d like to test it, please don’t hesitate to let me know in this thread or email us: Looking forward to hearing your questions, opinions and ideas!


Thanks for posting, @simeonc. I moved the post to the correct category of “Publicity”.

Stan’s open source is flexible enough you can even include it in proprietary code bases like this one :-).

Is there a demo somewhere? I only see links to videos and blog posts and GitHub that’s obviously not the code because it’s not open source.

Hi @Bob_Carpenter,

Thanks for moving the post to the correct category and sorry for having missed that!

It’s nice to be reassured that the license for Stan is permissive enough for our use case. In case you’re wondering how exactly we use Stan in this service: we run httpstan in a Docker container and use its log_prob and log_prob_grad endpoints to evaluate the log-probability of a model and its gradient, which we use for HMC and calculating the Replica Exchange acceptance probabilities. An example demonstrating the interface to Stan can be found here, and that’s really all you need to (or currently can) provide to run Chainsail.

The link to the demo fell victim to the “max 5 links” rule, but here it is:

If you want to give it a try, I’ll have to authorize your (Gmail) email address in the user database, so don’t hesitate to either post it here or send me an email to
And if you (and possibly other interested Stan users) would like to hop on a call for a live demo or just a chat about the algorithms or the engineering at work (no worries, I’m not going to try to sell you anything ;-)), I’d be more than happy to!

As promised above, we now published another blog post in which we perform a detailed analysis of (a variation of) the soft k-means model aka fitting a a Gaussian mixture both using a single Markov chain and with multiple, connected chains via Chainsail. Hope you’ll find it interesting!

There are no comments, so I’ll comment here. The post says:

Correctly tuned, it is far less prone to the over- / undersampling of modes.

I’m not sure how to quantify this more broadly. The problem with soft k-means is that finding the MLE is NP hard in dimension.

Note that single chain sampling is also what most probabilistic programming libraries provide by default, although in a more advanced manner: they usually give the possibility to mix samples from multiple isolated chains and provide some sort of initial state estimation, which in practice makes these algorithms a bit more robust than our naive implementation here.

Not sure which systems you’re talking about here. Stan doesn’t do this. We just run parallel chains naively.

In terms of mixing multiple chains, you might be interested in the the stacking-based approach of @yuling et al. to this kind of problem: Using Stacking to Average Bayesian Predictive Distributions (with Discussion)

You might also be interested in this more sophisticated approach to multimodality and adaptation using ensemble sampling that “teleports” their “walkers” (much like parallel tempering, but with an approach to avoiding mode collapse):

  • Michael Lindsey, Jonathan Weare, and Anna Zhang. Ensemble Markov chain Monte Carlo withteleporting walkers. Preprint, arXiv:2106.02686.

We have two alternatives toward multimodal sampling:

  1. Adaptive path sampling in metastable posterior distributions, which in my opinion is also “sophisticated” and certainly contains a “sophisticated” root from simulated tempering and annealing.

  2. Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors, JMLR.

One nice extra: both methods are open-sourced and ready to use in Stan.


Yes, quantifying this is of course difficult. I don’t think Replica Exchange (or any other method) is a silver bullet, but I still think it can improve sampling in many situations, even though there’s still no way of knowing whether all modes have been discovered.

they usually give the possibility to mix samples from multiple isolated chains

Not sure which systems you’re talking about here. Stan doesn’t do this. We just run parallel chains naively.

What I meant when writing this is (I think) exactly what you wrote - naively running multiple chains in parallel, but later using all obtained samples for downstream analysis, in the hope that different chains discover different modes. Please correct me if I got this wrong!

In terms of mixing multiple chains, you might be interested in the the stacking-based approach

Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors, JMLR.

Thanks for this reference! I will have a look - from a first glance, I understand that the stacking approach doesn’t result in exact posterior samples, but that it excels when the goal is prediction. Sounds very useful indeed!

  • Michael Lindsey, Jonathan Weare, and Anna Zhang. Ensemble Markov chain Monte Carlo with teleporting walkers. Preprint, arXiv:2106.02686.

Excellent, I haven’t seen this either! This sounds super cool. I have to admit that since I left academic research, I didn’t follow the literature that closely anymore.

In general, we could very well envision an extension of our web service to other algorithms that run multiple, potentially many (meaning > than the number of processors on a standard PC) parallel Markov chains - a good part would be reusable, most notably the easily scalable cluster of nodes and the user interface, and, in the case of the teleporting walkers, the communication between processes. It seems to me that in such an extension, both this method and stacking could profit from the parallel computing provided by a cloud-based web service.

Thanks a lot! Actually, I still owe you and Andrew a response on an email thread regarding this paper and other from a while back, but finishing up the web service and vacation came in in the way. I will amend the blog post and mention that method (and stacking) and the fact that it is implemented in Stan - I’m sorry for this oversight!

Finally, I am not seeing our web service as a competition to Stan - rather quite the opposite: ideally, at one point, Stan and the web service would interface seamlessly, with Stan providing the model specification language and the local sampling, and Chainsail providing potentially massively parallel computing and algorithms that rely on just that and that are currently not implemented in Stan, such as Replica Exchange or the teleporting walker method linked above.

By the way, one core component of Chainsail has been open-source for a long time, namely the MPI-based Replica Exchange implementation. If you’re interested, you can find it on my GitHub.

We know they won’t all be discovered in high dimensions because there are many more modes than draws.

Stan will correctly report non-convergence if you run more than one chain. I would not recommend just throwing all the chains together and hoping for the best. That’s why we report non-convergence in these situations. We don’t spend a lot of time thinking about models where Stan can’t recover simulated parameter values, like high-dimensional Gaussian mixtures or neural networks.

I’m not sure what you mean by “exact posterior samples”. What I’m trying to say here is that there is no system that can effectively sample from high-dimensional mixture posteriors in the sense of recovering the parameters used to simulate data.

It’ll certainly make things go faster.

I agree. Stan’s competition is other open-source Bayesian inference tools and modeling languages like PyMC, Pyro, and Turing.jl.

Thanks for the link.