Parallelism in Stan

wds15 · March 6, 2017, 12:39pm

Hi!

As discussed recently in our meetings, I started a wiki on parallelism in Stan here:

The motivating example case are hierarchical ODEs, but I am sure that other areas of Stan can benefit from the proposed approach such that I see the page as a general point of reference for this discussion around parallelism. I think we should get our head around the questions:

Is this a good design principle which we want to introduce to the Stan language?
Any improvements for the design?

Comments are very welcome!

Let me know if sections are not clear and need some more explanations from my end.

Best,
Sebastian

drezap · March 25, 2026, 3:06pm

@wds15 Are you still around? The wiki doesn’t point to anything anymore, but if there’s a consensus, I’d do some grunt work and build it. I’ve done some parallelism before, in C. Seems like a good time.

I’m digging up something from a long time ago, not sure what updates to NUTS have been done yet.

Sounds like years ago the devs just talked about it and no one said yes or no.

This is an open-source project, so I’d be down to hit it.

I think we’re talking about this issue: https://github.com/stan-dev/stan/issues/2818https://github.com/stan-dev/stan/issues/2818

Was there a yes or no consensus on this? Is this outdated?

I’m looking for something to do.

I’m happy to build it as long as we have some good reviewers that understand parallelization (you).

And here’s another resource: Parallel dynamic HMC merits

Thoughts?

drezap · March 25, 2026, 10:07pm

Alright, anyone? There was something on @andrewgelman’s blog about another way to parallelize HMC, I don’t remember exactly. I’d be down for that. @Bob_Carpenter Thoughts?

I’m in the mood for some multi-threading in C++. I think @stevebronder has the biggest NUTS on the dev team right now, I’m down to implement this? @wds15 has disappeared.

I need a yes or no before I proceed.

I have successfully merged parallelized C code before, although admittedly, I’m not the best communicator.

wds15 · March 25, 2026, 10:39pm

Still around! Just busy…happy to look at stuff.

A cool project would be a variadic map function with parallelism or/and a reduce sum with mpi backend (you could reformat variadic things tongue rect style implementation and just write an adapter).

the parallel nuts thing I suggested a while ago lets users double the cpu use while giving you a 40% Speedup or so…I d say it’s worth it, given it works on any model almost always. The price to pay is greater code complexity as I recall (or you neatly refactor things).

drezap · March 26, 2026, 4:17pm

Great, thanks so much. I just needed a green light to see if this is something we wanted. Adding more features can increase maintenance costs and since it’s a small open source project, it inhibits the ability for maintenance. Thanks everyone

wds15 · March 27, 2026, 8:23am

Just a warning: While I proposed the parallel NUTS thing… I was not able to convince key stakeholders to pick it up. The assessment at the time was that the benefits for 30-40% speedup for doubling cpu resource use is not worth if; in particular in view of the added complexity of the code. I personally have a different view as I use all the time computers which have lots of CPUs and being 30-40% faster for ANY model without changing the model at all… screams for me to implement it. There is a test implementation based on the Intel TBB graph parallelism stuff, which was fun to play with.

drezap · March 27, 2026, 9:11am

Alright, stakeholders meaning benefactors? I’m not getting paid, I would be doing this for practice/fun. It could be put on a different branch if someone wants to pull and use it for a specific purpose. I’m independent, I don’t have funding, really. But thanks for the reply. So you’re saying you’re in favor? It doesn’t have to be a merged main branch, just a separate feature if it’s helpful to people with less computational resources. Can you point me to the prototype repo? Thanks.

wds15 · March 27, 2026, 8:34pm

This should be the one: GitHub - stan-dev/stan at feature/speculative-nuts · GitHub It did run with the stan stuff from back then. TBB / oneTBB… should not matter.

drezap · March 28, 2026, 2:53am

Yeah, people say integrating C/C++ isn’t a big deal, but when you’re actually doing it it’s like walking on hot coals. I’ll go with oneTBB I guess, it’s more modern and probably better maintained in the future

drezap · March 31, 2026, 11:40am

So I’m just seeing tbb::concurrent vector, does this abstract away the mapping and reducing? I had only done this once, and the multithreading library was internally developed so I don’t have access to documentation anymore.

I’m looking here: const bool run_serial = stan::math::internal::get_num_threads() == 1;

line 248: stan/src/stan/mcmc/hmc/nuts/base_nuts.hpp at feature/speculative-nuts · stan-dev/stan · GitHub

Is this SMP or MPP? I haven’t used TBB. Ususally you have you write a mapping function and a reducing function, not sure, I don’t remember.

And how far did you push it? How many threads and on what models?

It looks like on TBB a lot of the programming is abstracted away from you, am I wrong?

I had some help, sure, but it has been 3 years since I tried to multithread anything.

drezap · March 31, 2026, 3:25pm

I’m talking to myself, but I think I’m going to crank up the number of threads, at that line of code, and open a PR, not for the purpose of merging, but, since we have a database of models that evaluates efficiency, I want to see how it scales. Again, I’m not as familiar with TBB, so I have some learning to do. May be I’m wasting Columbia’s computational resources…

And then we can also multithread expensive algorithms like Cholesky decomposition, independently, not just NUTS, right? Are we already doing this or no?

Topic		Replies	Views
Proposed parallelism RFC - Stan language bits Developers	14	1113	July 9, 2019
Scaling the multithreading Developers	5	267	April 21, 2026
Within-chain parallelization idea (maybe crazy) Developers	35	3071	February 24, 2022
Parallel dynamic HMC merits Developers features	41	3471	March 27, 2026
RStan (PyStan) & MPI / GPU Developers features	43	3637	September 24, 2017

Parallelism in Stan

Related topics