I just pushed to cmdstan and stan repos my pooling warmup prototype.
Right now I am abusing the
STAN_NUM_THREADS variable to specify the number of chains which are being run. Nothing runs in parallel though yet. I figured thats not yet necessary to do so as I would first like to learn how we should pool the adaptation info.
At the moment I am just pooling the covariances which each chain learns in each window. The stan services framework isn’t really setup to share information between chains such that I had to trick a bit to get the window information.
I got side-tracked with rstan emergency which is why I was a bit silent on this front.
To me the next steps would be: Figure out a benchmark; possibly along the lines I outlined on discourse. Then we tweak the warmup pooling to the point where this is beneficial => so the total numerical effort must be lower when we use the pooling the info - otherwise we won’t see any speedups as going parallel means to add some friction inevitably.
Then we can go crazy on specs / design docs and finally we implement it.
Let me know what you think.