Possible to run multiple models in parallel using Dask?

gibsramen · February 4, 2021, 6:47pm

I have a dataset for which I would essentially like to run the same Stan model on each column. Rather than using 1 core for each chain I’d like to use 1 core for each column. I was able to get this working for PyStan 2.19 but I’m not sure how to do so (if even possible) with the beta version. I recognize that things are still in development so apologies if this is not supported at the moment.

I’ve simplified the script code to focus on the issue at hand but please feel free to ask for more information/data/code.

import dask
from dask.distributed import Client
import numpy as np
import pandas as pd
import stan

def main():
    """
    Format of the data table:
    ============================================
               OTU9  OTU15   OTU20  OTU41  OTU47
    11835.11  103.0   89.0  1271.0   39.0   64.0
    11835.12  895.0  616.0    66.0   29.0   47.0
    11835.13    0.0    0.0    14.0  314.0  140.0
    11835.14   27.0   30.0     2.0  103.0   50.0
    11835.15    0.0    0.0    36.0   68.0   55.0
    """

    """
    Format of the Stan code:
    ============================================
    data {
        int<lower=0> N;
        int y[N];
        ( ... )
    }
    parameters { ... }
    model {
        y ~ ( ... )
    }
    """
    dat = { ... }

    @dask.delayed
    def fit_single_column(values):
        dat["y"] = values.astype(int)  #  update dat each iteration
        sm = stan.build(stancode, data=dat, random_seed=42)
        fit = sm.sample(num_chains=1, num_samples=100)
        return fit

    fits = []
    for col in tbl.columns:
        values = tbl[col].values.astype(int)
        fits.append(fit_single_column(values))

    fits = dask.compute(*fits)

if __name__ == "__main__":
    client = Client(n_workers=4)  # run 4 columns at a time
    main()

Output here: output.txt (22.9 KB)

Looks like it’s some issue with the caching in httpstan but I’m having trouble diagnosing further.

Runtime details

macOS BigSur
2.3 GHz Dual-Core Intel Core i5
8 GB RAM
PyStan version 3.0.0b7

ahartikainen · February 8, 2021, 10:01am

Would threads work?

client = Client(n_workers=4, processes=False)

Topic		Replies	Views
How to parallelize PyStan / Stan.jl / RStan runs that use different data? Interfaces	10	1180	December 25, 2022
Multithreading with pystan3 General	17	1270	September 22, 2024
Running stan models in parallel in cmdstan py Interfaces cmdstanpy	3	504	August 31, 2021
Chain parallelization with Stan in Slurm General paralellization	2	528	August 16, 2023
Run multiple stan models in parallel Developers rstan	4	2210	June 13, 2020

Possible to run multiple models in parallel using Dask?

Runtime details

Related topics