Help with reduce_sum

I’ve worked through the long discussions on Discourse - Help with naming threading argument - specifically Aki’s comment Help with naming threading argument - #10 by avehtari which presume a machine with 8 available CPUs. I’m not sure what cores meand - maybe there’s a difference between R and Python? CmdStanPy uses a the concurrent.futures module which handles the calls to the subprocess module.

The sample method used to have args chains and cores - we’ve renamed cores to parallel_chains. The old logic was the if parallel_chains is greater than number of CPUs, set parallel_chains to number of CPUs. The question is whether or not to take threads_per_chain into account as well.

From the discussions here and the CmdStanR GitHub issue, I’m a little unclear what CmdStanR does - was the final decision that if user specifies both parallel_chains and threads_per_chain then that many chains are run in parallel, no matter what the number of available CPUs?

This is somewhat dangerous, because code takes on a life of its own - if someone codes up an analysis and runs it on their 32 core workstation with chains=6, parallel_chains=6, threads_per_chain=5, all is fine. If someone else with a laptop an Intel dual-core processor which really only has 2 CPUs (Intel’s hyperthreading reports that is has 4 CPUs), and runs the script as written, what is the best thing to do?

The PR I’ve submitted tries to adjust down the number of parallel_chains. I’ll use the reduce_sum case study and do some timing experiments on the various machines I have access to.

3 Likes