Cross-chain warmup adaptation using MPI

avehtari · February 3, 2020, 10:41am

As much sense as anytime running just one chain. You have much less information available for diagnostics. So we don’t recommend it and testing with one chain has very low priority.

Our ESS computation uses Rhat which is best when running more than 1 chain. Wall clock time is more complex than iterations and n_leapfrog.

No, we are not yet comparing anything per wall clock time because that is one of the most complex thing possible in computers. We want to keep the complexity down as much as we can, and thus we compare

change in number of target evaluations (n_leapfrog) in warmup
change in number of iterations in warmup
change in number of target evaluations in actual sampling
change in number of iterations in actual sampling
change in ESS / number of target evaluations in actual sampling
change in ESS / iterations in actual sampling

These do not depend on technical implementation issues which may have very big effect on wall clock time. After we understand the behavior for the above quantities, we can fix algorithm details and then start to optimize for wall clock time.

Can we start another thread for MPI vs. others and discuss here about adaptive warmup? For testing whether the proposed warmup is useful it doesn’t matter how the different chains are run.

I don’t care wall clock time at this point. It’s complete distraction at this point.

I feel I’m repeating myself, but

change in number of target evaluations (n_leapfrog) in warmup
change in number of iterations in warmup
change in number of target evaluations in actual sampling
change in number of iterations in actual sampling
change in ESS / number of target evaluations in actual sampling
change in ESS / iterations in actual sampling

It’s great if you are happy with the current warmup, and it will be an option in the future, too. If you are happy with the current warmup, there is not much interesting here for you.

Sorry I’ve have not been clear enough: I think we don’t need multi-core capabilities and I don’t care at this point what the technical implementation is as far as we can test it. I assume the multi-whatever implementation (someone else can decide later what that implementation is) makes it easier to chains to communicate and single core computation could be even more complex, but I don’t care in the context of these tests. We have now something which can be used for testing an algorithm without concerning wall clock time or library dependencies and we can worry about those later.

I was hoping that in this thread we would discuss more about 1) the algorithm, like how to minimize the computation and communication in repeated Rhat and ESS computations, and 2) results from people running experiments with posteriors they know are difficult.

Topic		Replies	Views
New adaptive warmup proposal (looking for feedback)! Algorithms	50	4152	July 31, 2020
Cmdstanpy, mpi speedup Developers	26	212	November 19, 2024
Any way to speed up warmup? General performance	5	1929	July 18, 2020
Minimizing warmup iterations - Error reading step size from CmdStan output Interfaces bug , cmdstanpy	4	471	August 15, 2023
Pooled warmup Developers stan	7	893	December 3, 2019

Cross-chain warmup adaptation using MPI

Related topics