Sampling in parallel using threads

ariddell · June 20, 2018, 5:32pm

I would like to draw samples in parallel from a Stan model on a multi-core computer using system threads. This is important for supporting parallel sampling on Windows in PyStan. (macOS and Linux can use fork to create independent processes.)

In pseudocode, I’m doing the following in four separate system threads:

stan_model * model = new stan_model(var_context)
return_code = hmc_nuts_diag_e(*model, init_var_context, random_seed, ...)

With STAN_THREADS defined this works (instead of crashing, as it did before stan-math PR #509) but it produces samples at the same rate as if I had drawn the samples serially. The independent cores are not being used to produce draws in parallel.

Is this expected behavior? Should I be able to use system threads to draw samples in parallel faster than I would be able to draw the same number of samples serially?

wds15 · June 20, 2018, 5:45pm

If you define STAN_THREADS then the AD stack is treated thread local. This then allows you

to run stan models inside threads (so different chains can run in parallel using threads)
you can take advantage of parallelization within a chain if your stan model uses the new map_rect

What you describe sounds like there is no performance degradation when switching on STAN_THREADS which is great as I would expect this to happen. The thread local thing adds overhead to everything, but that overhead will be model/platform/compiler specific.

I hope this helps and makes sense.

ariddell · June 23, 2018, 4:31pm

What you’ve described is what I anticipated happening. In practice, I’m not able to get different chains to run in parallel using threads. Is there a test which verifies this functionality?

I’ve tried making some quick modifications to cmdstan to do parallel sampling in threads ~~but I’m ending up with segfaults~~ (it works). For example, I’ve rewritten cmdstan/main.cpp to be this:

#include <cmdstan/command.hpp>
#include <stan/services/error_codes.hpp>
#include <boost/exception/diagnostic_information.hpp> 
#include <boost/exception_ptr.hpp> 

#include <thread>         // std::thread, remember to compile with -pthread

int main(int argc, const char* argv[]) {
  try {
    std::thread second (cmdstan::command<stan_model>, argc, argv);
    cmdstan::command<stan_model>(argc, argv);
    second.join();                // pauses until thread finishes
    std::cout << "thread finished.\n";
    return 0;
  } catch (const std::exception& e) {
    std::cout << e.what() << std::endl;
    return stan::services::error_codes::SOFTWARE;
  }
}

edit: it all works. false alarm

ariddell · June 23, 2018, 4:58pm

I’m sorry. I forgot to define STAN_THREADS. Let me keep trying for a moment.

ariddell · June 23, 2018, 5:05pm

I spoke too soon. It works fine when the threads are created in C++. The source of the problem must be that I’m creating the threads in Python – or somewhere else.

maedoc · June 23, 2018, 6:23pm

What did you mean by system threads?

ariddell · June 23, 2018, 6:27pm

This: https://en.wikipedia.org/wiki/Thread_(computing)

On Linux and macOS threads are POSIX threads, I think. On Windows it’s something different.

maedoc · June 23, 2018, 6:36pm

Python threads aren’t threads in that sense, hence my question. From your description it sounds like the GIL isn’t being released, while wrapping a C++11 thread and releasing the GIL would work.

ariddell · June 23, 2018, 9:12pm

Python threads are threads in this sense. https://docs.python.org/3.6/library/threading.html

I got it working. With the GIL released, things work as expected.

mjack · November 16, 2018, 4:55pm

Could you upload the whole part of the code that utilizes the multi threading as a reference?

ahartikainen · November 16, 2018, 5:17pm

This file?

github.com

stan-dev/httpstan/blob/master/httpstan/main.py

"""Configure httpstan server.

Configure the server and schedule startup and shutdown tasks.
"""
import asyncio
import logging
import threading
from typing import Optional

import aiohttp.web
import uvloop

import httpstan.routes

logger = logging.getLogger("httpstan")


def make_app() -> aiohttp.web.Application:
    """Assemble aiohttp Application.

This file has been truncated. show original

Topic		Replies	Views
Multiprocessing and/or multithreading problem - CmdStanPy Modeling cmdstanpy , paralellization	12	280	January 2, 2025
Running chains on multiple cores Developers	2	949	January 30, 2023
Cmdstanpy: multithreading issues (threads_per_chain) CmdStan cmdstanpy	2	595	December 13, 2023
Multithreading with pystan3 General	17	1417	September 22, 2024
Threading in rstan 2.18 General	30	4448	March 26, 2020

Sampling in parallel using threads

Related topics