Cross-chain warmup adaptation using MPI

As much sense as anytime running just one chain. You have much less information available for diagnostics. So we don’t recommend it and testing with one chain has very low priority.

Our ESS computation uses Rhat which is best when running more than 1 chain. Wall clock time is more complex than iterations and n_leapfrog.

No, we are not yet comparing anything per wall clock time because that is one of the most complex thing possible in computers. We want to keep the complexity down as much as we can, and thus we compare

  • change in number of target evaluations (n_leapfrog) in warmup
  • change in number of iterations in warmup
  • change in number of target evaluations in actual sampling
  • change in number of iterations in actual sampling
  • change in ESS / number of target evaluations in actual sampling
  • change in ESS / iterations in actual sampling

These do not depend on technical implementation issues which may have very big effect on wall clock time. After we understand the behavior for the above quantities, we can fix algorithm details and then start to optimize for wall clock time.

Can we start another thread for MPI vs. others and discuss here about adaptive warmup? For testing whether the proposed warmup is useful it doesn’t matter how the different chains are run.

I don’t care wall clock time at this point. It’s complete distraction at this point.

I feel I’m repeating myself, but

  • change in number of target evaluations (n_leapfrog) in warmup
  • change in number of iterations in warmup
  • change in number of target evaluations in actual sampling
  • change in number of iterations in actual sampling
  • change in ESS / number of target evaluations in actual sampling
  • change in ESS / iterations in actual sampling

It’s great if you are happy with the current warmup, and it will be an option in the future, too. If you are happy with the current warmup, there is not much interesting here for you.

Sorry I’ve have not been clear enough: I think we don’t need multi-core capabilities and I don’t care at this point what the technical implementation is as far as we can test it. I assume the multi-whatever implementation (someone else can decide later what that implementation is) makes it easier to chains to communicate and single core computation could be even more complex, but I don’t care in the context of these tests. We have now something which can be used for testing an algorithm without concerning wall clock time or library dependencies and we can worry about those later.

I was hoping that in this thread we would discuss more about 1) the algorithm, like how to minimize the computation and communication in repeated Rhat and ESS computations, and 2) results from people running experiments with posteriors they know are difficult.

1 Like