PyStan throws error when running chains in parallel (n_jobs > 1)

I recently installed PyStan from conda-forge in a fresh conda environment on MacOS Mojave 10.14.5. I’m having trouble running multiple chains in parallel (i.e. with n_jobs > 1). Stan seems to not be able to find the compiled model when n_jobs > 1, but it has no such problems when n_jobs = 1. The error is as follows:

Process SpawnPoolWorker-4:
Traceback (most recent call last):
  File "/Users/pbhambhani/misc/ljmu/summer_project/pystan_env/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/pbhambhani/misc/ljmu/summer_project/pystan_env/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/pbhambhani/misc/ljmu/summer_project/pystan_env/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/Users/pbhambhani/misc/ljmu/summer_project/pystan_env/lib/python3.8/multiprocessing/queues.py", line 358, in get
    return _ForkingPickler.loads(res)
ModuleNotFoundError: No module named 'stanfit4anon_model_482f7e78a7c2d6f09648bba041f6f372_1939374203243207487'

Other information:
Python version: 3.8.5
PyStan version: 2.19.1.1
Compiler: clang 9.0.1 - I think this is installed by PyStan, and is different from the default clang version on my system. The latter reads Apple LLVM version 10.0.1 (clang-1001.0.46.4).

I noticed a couple of warnings related to linking. I am not a c++ expert so I’m not sure if they’re related to this issue I’m facing, but posting them here just in case.

clang-9: warning: -Wl,-export_dynamic: 'linker' input unused [-Wunused-command-line-argument]

ld: warning: -pie being ignored. It is only used when linking a main executable

Finally FWIW, I also have scalastan installed on my machine for a different project, and that seems to have no trouble running chains in parallel. That one seems to use cmdstan 2.19.1.

Any help is appreciated. Thanks!

1 Like

Do you run your python in a script?

Try to run your code in __name__ == "__main__" block

import pystan

if __name__ == "__main__":
    sm = pystan.StanModel(...)
    fit = sm.sampling()

I think this is due to behavior of the multiprocessing (macOS started to use spawn instead of fork)

See https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

Thanks for the reply. I have been running it in a jupyter notebook so far.

I tried running it in the __name__ == "__main__" block, and the error still persists :(

From the multiprocessing docs, it seems the change to spawn has started since Python 3.8. I could try using a lower version of Python 3 to see if that fixes this error.

Could you add this in first cell before imports

import multiprocessing
multiprocessing.set_start_method("fork")
2 Likes

That works indeed! Thank you.

I was going to ask if this should be logged as a PyStan issue, but looks like someone already raised this back in April. https://github.com/stan-dev/pystan/issues/693

Hi,

it also worked for me but stopped. If I run it now:

import multiprocessing
multiprocessing.set_start_method("fork")

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-27-b0fa033a5f53> in <module>
      1 import multiprocessing
----> 2 multiprocessing.set_start_method("fork")

~/anaconda3/lib/python3.8/multiprocessing/context.py in set_start_method(self, method, force)
    241     def set_start_method(self, method, force=False):
    242         if self._actual_context is not None and not force:
--> 243             raise RuntimeError('context has already been set')
    244         if method is None and force:
    245             self._actual_context = None

RuntimeError: context has already been set

and the terminal window says the same as before:

Process SpawnPoolWorker-212:
Process SpawnPoolWorker-211:
Process SpawnPoolWorker-213:
Process SpawnPoolWorker-214:
Traceback (most recent call last):
  File "/Users/jan/anaconda3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/Users/jan/anaconda3/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/jan/anaconda3/lib/python3.8/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/Users/jan/anaconda3/lib/python3.8/multiprocessing/queues.py", line 358, in get
    return _ForkingPickler.loads(res)
ModuleNotFoundError: No module named 'stanfit4anon_model_649148968e47447c8e6aa386bb89c275_3157769158311472631'
Traceback (most recent call last):
.
.
.

Any idea what is happening and how to fix it?
Thanks

What python version do you have?

edit. Do you run some code before calling that line?

Python 3.8.5 and yes. Actually, when I now restarted the kernel, it worked. I can try to reproduce how it happened (I run some PYMC3 models,could that have an effect?)

Yes, I think they do some set-up too.

1 Like