Pooled warmup

@stevebronder asked me to share my current prototype on parallel warmup … maybe this info here is useful for others as well (@bbbales2, @Bob_Carpenter, @yizhang, @betanalpha)

I just pushed to cmdstan and stan repos my pooling warmup prototype.

Right now I am abusing the STAN_NUM_THREADS variable to specify the number of chains which are being run. Nothing runs in parallel though yet. I figured thats not yet necessary to do so as I would first like to learn how we should pool the adaptation info.

At the moment I am just pooling the covariances which each chain learns in each window. The stan services framework isn’t really setup to share information between chains such that I had to trick a bit to get the window information.

I got side-tracked with rstan emergency which is why I was a bit silent on this front.

To me the next steps would be: Figure out a benchmark; possibly along the lines I outlined on discourse. Then we tweak the warmup pooling to the point where this is beneficial => so the total numerical effort must be lower when we use the pooling the info - otherwise we won’t see any speedups as going parallel means to add some friction inevitably.

Then we can go crazy on specs / design docs and finally we implement it.

Let me know what you think.



What are the branches?

As I was saying before I think it will be more productive to work out the changes to the services first before worrying about specific adaptation strategies (or parallelizaiton technologies, for that matter).

I never worked on stan services. It took me a while to figure out the logic. Without a prototype I would not be able to make sensible designs.

You mean for things like specifying number of chains? Right now, each service call is an independent chain. I think the main change we’ll need from the services is a way to deal with multiple chains of output. The input and config should be straightforward assuming it’s shared among the chains.

Correct – see also the progression I suggested in the other thread,

In addition to setting up new adaptation strategies 1 and 2 alone would allow the interfaces to simplify quite a bit and provide a more coherent user experience.

Sure…but 1 and 2 are already implemented by all of our interfaces. It’s obviously a win to do it consistent, but it is no win in terms of new functionality.

Don’t we need to do (1) and (2) at the C++ level with tbb to also imbed (3) and (4)?

We do need them, yes…these are just a lot of work without too much net benefit…but we need them, of course.