Cross-chain warmup adaptation using MPI

I am wondering how mature this all is and what plans are to integrate it into the stan repositories (stan algorithm, MPI subsystem, MPI & threading)? This may sound impatient, but the result are seemingly very nice such that the community will benefit a lot from it.

New warmup strategies will likely need a bit of alignment here and there which will be quite a process to go though, but it appears as if the merits from this work are absolutely worthwhile going that mile.

I can certainly help/comment on the threading/MPI bits.

(bigger changes like this are hard to carry through - I speak out of my own recent experience)

IMO the best way one can facilitate this effort now is to play with it on his models. Once we have enough confidence on the algorithm we can move on to implementation details.

Is there already a plan as to what the bar is here?

The final goal is to provide user principled guidance on how to use it. Along the way we’ll need to identify, for example, default values for target ess & rhat, as well as optimal way of aggregating stepsize & metrics. Despite some success of the algorithm, it could also fail on simple models. Take eight school model for instance, the proposed algorithm could have suboptimal metric/stepsize and significantly more divergence. The following summary is based on a rather large target_ESS(=800) and comparison against same num_warmup(=600) regular stan runs.

cross_chain_summary.pdf (74.8 KB)

3 Likes

The performance on sblrc-blr from posteriordb looks promising. This is not from cherry-picking a nice-looking run but consistent outcome. Among the model I’ve tested this one shows the most significant improvement of ESS.

5 Likes

Shorter warmup and more efficient sampling for both metrics. That is cool.

I’m attaching a poster by me, @billg, @bbbales2, @avehtari for ACoP 11 last week. In that poster we shows

  • cross-chain warmup’s performance on a bunch of models from posteriodb.
  • how cross-chain warmup can be combined with within-chain parallelization in a multi-level parallel framework.
    WED-093_Yi_Zhang.pdf (399.0 KB)

Corresponding repo can be found at https://github.com/metrumresearchgroup/acop_2020_torsten_parallelization.

Based on the benchmark in the study I plan to add cross-chain warmup as experimental feature in next Torsten release.

10 Likes

Nice poster!

Is there any update on cross-chain ESS?

Not sure on the question. Some benchmark I’ve run shows the ESS I see is consistent with that from standard runs. Depends on tuning parameters of the algorithm it could be higher or lower.

Apologies, on mobile the discussion cut off way early in the thread. I am on desktop now and see all of the updates. Thanks!

Can I dig this up to ask @yizhang and @bbbales2 what kind of speedup (wall time till sampling starts) and efficiency gains (total #leapfrog steps during warmup) one can expect from this? Mainly for ODE models, but also for other models? I’d say only difficult models are interesting, ie models where warmup can take quite some time.

Edit: I guess it would be easiest to just try it myself. However, apparently I need a more recent version of stanc/math than included in the linked repo. I guess @stevebronder and @wds15 are actively working on the above algorithm? What’s the best way to get your working copy, and is it the same algorithm as proposed/evaluated in this thread?

2 Likes