Running stan models in parallel in cmdstan py

I’m wondering how to use parallelism when running stan models in cmdstanpy.
I’m doing simulation-based calibration for my model so I need to run fits multiple times.
When i tried using multiprocessing module
image

I got slower results than when using simple for loop:

image

Function I’m calling is

def compute_ranks(i):
    result_sbc = sbc_model.sample(data={'N_batch':4,'N':200,'batch':df.batch.values})
    ranks=(np.sum(result_sbc.stan_variable('lt_sim')[np.arange(0, 4000 - 7, 8)],axis=0))
    return ranks

Im not giving model code as I am not sure if relevant in this case.

Operating System: macOs BigSur 11.3/mac Mini M1 16 GB ram
Interface Version: cmdstanpy 0.9.76
Compiler/Toolkit: xcode

I’ll openly admit my python multiprocessing experience is shaky, but my guess is this may be slowed down by sharing memory inside your dataframe.

Just to spitball, what is the result if you do 'batch': df.batch.to_numpy(copy=True)?

No improvement.
EDIT: Ok, It does not even work as multiprocessing gets immediately stuck.

Try running in thread pool. (Also how many cpu per chain do you have and how is your ram?)