Pooled warmup

@Stevo15025 asked me to share my current prototype on parallel warmup … maybe this info here is useful for others as well (@bbbales2, @Bob_Carpenter, @yizhang, @betanalpha)

I just pushed to cmdstan and stan repos my pooling warmup prototype.

Right now I am abusing the STAN_NUM_THREADS variable to specify the number of chains which are being run. Nothing runs in parallel though yet. I figured thats not yet necessary to do so as I would first like to learn how we should pool the adaptation info.

At the moment I am just pooling the covariances which each chain learns in each window. The stan services framework isn’t really setup to share information between chains such that I had to trick a bit to get the window information.

I got side-tracked with rstan emergency which is why I was a bit silent on this front.

To me the next steps would be: Figure out a benchmark; possibly along the lines I outlined on discourse. Then we tweak the warmup pooling to the point where this is beneficial => so the total numerical effort must be lower when we use the pooling the info - otherwise we won’t see any speedups as going parallel means to add some friction inevitably.

Then we can go crazy on specs / design docs and finally we implement it.

Let me know what you think.