The Python Global Interpreter Lock or GIL, in simple words, is a mutex (or a lock) that allows only one thread to hold the control of the Python interpreter. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads. What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.
Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library. For actual parallelization in Python, you should use the multiprocessing module to fork multiple processes that execute in parallel (due to the global interpreter lock, Python threads provide interleaving, but they are in fact executed serially, not in parallel, and are only useful when interleaving I/O operations). However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
Just curious: why didn’t PyStan choose to do threading in Python? Having models be pickle-able is great, and it’s just one further step to spawn subprocesses and pass them the model…
OK. Perhaps OT, but will some variant of PyStan continue which interfaces to in-process compiled Stan code? It’s nice not having to serialize+deserialize for the really big data sets.
But even copying is fairly fast compared to, say, np.savetxt(fname, big_array); np.loadtxt(fname). If httpstan standardizes a binary data exchange format then it’d be similar (at least on local machine).