From Bob’s thread on parallel performance It sounds like the goal here is to decrease the time to ‘convergence’. Could we do something like
- For K chains, do adaption for N iterations and sample for M iterations, saving the mass matrix
- Calculate ESS
- Rerun with supplied mass matrix for additional N adaption iterations and sample for another M iterations
- Goto 2 until some arbitrary number of adaption steps have happened.
- Take ESS and divide it by the number of M samples to get a normalized ESS
- Divide that by the number of warmups to get a Normalized ESS per adaption iterations.
NESS= \frac{ESS}{Samples} \\
Hurdle = \frac{NESS}{Adaption}
5 and 6 I’m just sort of throwing out there, but it’s nice because it at least sets some form of a lower and upper bound on the performance metric (max is (1/1) and min of 0 can be either poor ESS or adaption going to infinity). Though I’m not sure it’s that interpretable which is a shame. It also can’t handle 0 adaption steps (though we could just slap a 1 in the denominator)
Would wall time be a metric here? It could be nice as I’m not sure how much the overhead vs. any data we can share between chains without making copies would effect overall speed.