As a very later response to @Funko_Unko 's question here, in the on-going ACoP12 conference we have a poster showing a benchmark for cross-chain warmup and multiple(>4)-chain efficiency. acop_2021_poster.pdf (138.0 KB)
tl;dr
Combing cross-chain warmup and a large number (>4) of chains could be an efficient strategy to scale ESS/time. Even though the warmup quality may suffer (but not significantly) when the number of chains is large the reduced post-warmup sampling time makes up this loss.