Sampling in parallel using threads

ariddell · June 20, 2018, 5:32pm

I would like to draw samples in parallel from a Stan model on a multi-core computer using system threads. This is important for supporting parallel sampling on Windows in PyStan. (macOS and Linux can use fork to create independent processes.)

In pseudocode, I’m doing the following in four separate system threads:

stan_model * model = new stan_model(var_context)
return_code = hmc_nuts_diag_e(*model, init_var_context, random_seed, ...)

With STAN_THREADS defined this works (instead of crashing, as it did before stan-math PR #509) but it produces samples at the same rate as if I had drawn the samples serially. The independent cores are not being used to produce draws in parallel.

Is this expected behavior? Should I be able to use system threads to draw samples in parallel faster than I would be able to draw the same number of samples serially?

wds15 · June 20, 2018, 5:45pm

If you define STAN_THREADS then the AD stack is treated thread local. This then allows you

to run stan models inside threads (so different chains can run in parallel using threads)
you can take advantage of parallelization within a chain if your stan model uses the new map_rect

What you describe sounds like there is no performance degradation when switching on STAN_THREADS which is great as I would expect this to happen. The thread local thing adds overhead to everything, but that overhead will be model/platform/compiler specific.

I hope this helps and makes sense.

ariddell · June 23, 2018, 4:31pm

What you’ve described is what I anticipated happening. In practice, I’m not able to get different chains to run in parallel using threads. Is there a test which verifies this functionality?

I’ve tried making some quick modifications to cmdstan to do parallel sampling in threads ~~but I’m ending up with segfaults~~ (it works). For example, I’ve rewritten cmdstan/main.cpp to be this:

#include <cmdstan/command.hpp>
#include <stan/services/error_codes.hpp>
#include <boost/exception/diagnostic_information.hpp> 
#include <boost/exception_ptr.hpp> 

#include <thread>         // std::thread, remember to compile with -pthread

int main(int argc, const char* argv[]) {
  try {
    std::thread second (cmdstan::command<stan_model>, argc, argv);
    cmdstan::command<stan_model>(argc, argv);
    second.join();                // pauses until thread finishes
    std::cout << "thread finished.\n";
    return 0;
  } catch (const std::exception& e) {
    std::cout << e.what() << std::endl;
    return stan::services::error_codes::SOFTWARE;
  }
}

edit: it all works. false alarm

ariddell · June 23, 2018, 4:58pm

I’m sorry. I forgot to define STAN_THREADS. Let me keep trying for a moment.

ariddell · June 23, 2018, 5:05pm

I spoke too soon. It works fine when the threads are created in C++. The source of the problem must be that I’m creating the threads in Python – or somewhere else.

maedoc · June 23, 2018, 6:23pm

What did you mean by system threads?

ariddell · June 23, 2018, 6:27pm

This: https://en.wikipedia.org/wiki/Thread_(computing)

On Linux and macOS threads are POSIX threads, I think. On Windows it’s something different.

maedoc · June 23, 2018, 6:36pm

Python threads aren’t threads in that sense, hence my question. From your description it sounds like the GIL isn’t being released, while wrapping a C++11 thread and releasing the GIL would work.

ariddell · June 23, 2018, 9:12pm

Python threads are threads in this sense. https://docs.python.org/3.6/library/threading.html

I got it working. With the GIL released, things work as expected.

mjack · November 16, 2018, 4:55pm

Could you upload the whole part of the code that utilizes the multi threading as a reference?

ahartikainen · November 16, 2018, 5:17pm

This file?

github.com

stan-dev/httpstan/blob/master/httpstan/main.py

"""Configure httpstan server.

Configure the server and schedule startup and shutdown tasks.
"""
import asyncio
import logging
import threading
from typing import Optional

import aiohttp.web
import uvloop

import httpstan.routes

logger = logging.getLogger("httpstan")


def make_app() -> aiohttp.web.Application:
    """Assemble aiohttp Application.

This file has been truncated. show original

Topic		Replies	Views
Running cmdstanr in parallel on computing cluster General	6	999	December 9, 2022
How to define STAN_THREADS? Developers	7	1095	November 21, 2021
Cmdstanpy: multithreading issues (threads_per_chain) CmdStan cmdstanpy	2	518	December 13, 2023
Multithreading with pystan3 General	17	1274	September 22, 2024
Multiprocessing and/or multithreading problem - CmdStanPy Modeling cmdstanpy , paralellization	12	105	January 2, 2025

Sampling in parallel using threads

Related topics