There are is an edge case in the job chunking code used by threading (not MPI) such that you can get incorrect results with threading turned on. It seems relatively rare, but it’s taking me a while to come up with a procedure for ascertaining whether this bug affected you or not so I wanted to post that it exists now. We have a hotfix from @wds15 going through testing now and will release 2.18.1 in the coming days with the fix. I will post an update to this thread once I figure out the exact minimal conditions under which this bug occurs.
Here is a Python script to run (improved over my old version by @ahartikainen) with the size of the collection you are running
map_rect over (called
shards here) and the number of threads you configured (
def showsbug(shards, threads): if threads < 2: return False return shards % min(shards, threads) == threads - 1
This returns True if your workload should display the threading bug and you should re-run (likely re-running with a different number of threads or size of collection should solve it, but test with this function). I am pretty embarrassed that I couldn’t quickly find a closed-form solution to this question, but there may not be one due to all of the special logic in here, and this seems good enough.