Understanding reduce_sum efficiency

Thanks, this is helpful.

So the block sorting is “random” in the sense that there is no particular reason for the way the blocks end up being sequenced, but they nonetheless end up in some overall sequence. For dataset construction purposes then, you can just pick an order that is a mixture of group sizes and probably end up with something reasonably optimal, if I am reading you correctly. In terms of exactly which group order ends up being truly optimal within that general approach, that’s something the user has to test and discover for themselves. All true?