Possible to run multiple models in parallel using Dask?

I have a dataset for which I would essentially like to run the same Stan model on each column. Rather than using 1 core for each chain I’d like to use 1 core for each column. I was able to get this working for PyStan 2.19 but I’m not sure how to do so (if even possible) with the beta version. I recognize that things are still in development so apologies if this is not supported at the moment.

I’ve simplified the script code to focus on the issue at hand but please feel free to ask for more information/data/code.

import dask
from dask.distributed import Client
import numpy as np
import pandas as pd
import stan

def main():
    """
    Format of the data table:
    ============================================
               OTU9  OTU15   OTU20  OTU41  OTU47
    11835.11  103.0   89.0  1271.0   39.0   64.0
    11835.12  895.0  616.0    66.0   29.0   47.0
    11835.13    0.0    0.0    14.0  314.0  140.0
    11835.14   27.0   30.0     2.0  103.0   50.0
    11835.15    0.0    0.0    36.0   68.0   55.0
    """

    """
    Format of the Stan code:
    ============================================
    data {
        int<lower=0> N;
        int y[N];
        ( ... )
    }
    parameters { ... }
    model {
        y ~ ( ... )
    }
    """
    dat = { ... }

    @dask.delayed
    def fit_single_column(values):
        dat["y"] = values.astype(int)  #  update dat each iteration
        sm = stan.build(stancode, data=dat, random_seed=42)
        fit = sm.sample(num_chains=1, num_samples=100)
        return fit

    fits = []
    for col in tbl.columns:
        values = tbl[col].values.astype(int)
        fits.append(fit_single_column(values))

    fits = dask.compute(*fits)

if __name__ == "__main__":
    client = Client(n_workers=4)  # run 4 columns at a time
    main()

Output here: output.txt (22.9 KB)

Looks like it’s some issue with the caching in httpstan but I’m having trouble diagnosing further.

Runtime details

  • macOS BigSur
  • 2.3 GHz Dual-Core Intel Core i5
  • 8 GB RAM
  • PyStan version 3.0.0b7
1 Like

Would threads work?

client = Client(n_workers=4, processes=False)

1 Like