I am fitting a model with 4 parallel chains in pystan using pycharm console. After two chains finish sampling, the entire process is stuck and two remaining chains never start. I use a linux centos7 cluster and I designate 50GB to the process. Each of the two chains takes 20GB.
Any advise would be appreciated.
It seems to me that the third chain has a large R hat and it dose not converge.
In my experience, the MCMC algorithm takes a long time if the chain dose not converge. If you can find a seed in which the sampling (using only a single chain ) dose not converge, then the problem is caused by the model or data …etc.
Thanks, although I am not sure this is the issue with my case. When I allocate enough memory (100GB), all chains are running at the same time. I am surprised the multiprocessing cannot move from the first two chains to the last two when I have less memory available. So maybe there is still a way around it.
Oh, that is true. Multiprocessing doesn’t work in that case. That is a pickling error.
You need to run chains serial (njobs=1)
Maybe run your model with one chain from the script and save output with arviz and run that script n times. Then combine chains once all have finished (arviz.concat).
yes, thank you! I just started a try with with n_jobs=1. Is this because the model is too big?
Also, I read that this issue was solved in pystan3 - will it be released anytime soon?
I see, thanks. I hope n_jobs=1 will work. I am still not sure what’s the solution to my original issue though… (or is it a branch of the same problem?)
PyStan 2 uses pickle to move the final output draws around. There are some size limits on how much you can pickle at one time. I suspect you are exceeding these limits.
I think the easiest way to solve this problem might be to thin your results so your final chain is smaller. Can you keep 1 out of 100 draws?
PyStan 3, unlike PyStan 2, does not use pickle to store the sampler output. If the pickle size limit is the source of the problem, PyStan 3 will not encounter it.
Also, PyStan 3 will work fine on Linux with version 3.7 of Python.
Update - setting thin=5 did the job. Also running pystan commands from command line makes everything faster. Any ideas why pycharm slows things down? @ahartikainen, I don’t know if you remember, but I had an issue with opening multiple *.pkl files in the came code, so runinng it from command line works well.
I believe the failed pickling is due to the size of the return value.
PyStan 2 pickles all the draws and returns them to the parent process.
PyStan 3 doesn’t pickle the draws (ever).