I just pushed to cmdstan and stan repos my pooling warmup prototype.
Right now I am abusing the STAN_NUM_THREADS variable to specify the number of chains which are being run. Nothing runs in parallel though yet. I figured thats not yet necessary to do so as I would first like to learn how we should pool the adaptation info.
At the moment I am just pooling the covariances which each chain learns in each window. The stan services framework isn’t really setup to share information between chains such that I had to trick a bit to get the window information.
I got side-tracked with rstan emergency which is why I was a bit silent on this front.
To me the next steps would be: Figure out a benchmark; possibly along the lines I outlined on discourse. Then we tweak the warmup pooling to the point where this is beneficial => so the total numerical effort must be lower when we use the pooling the info - otherwise we won’t see any speedups as going parallel means to add some friction inevitably.
Then we can go crazy on specs / design docs and finally we implement it.
As I was saying before I think it will be more productive to work out the changes to the services first before worrying about specific adaptation strategies (or parallelizaiton technologies, for that matter).
You mean for things like specifying number of chains? Right now, each service call is an independent chain. I think the main change we’ll need from the services is a way to deal with multiple chains of output. The input and config should be straightforward assuming it’s shared among the chains.
Correct – see also the progression I suggested in the other thread,
In addition to setting up new adaptation strategies 1 and 2 alone would allow the interfaces to simplify quite a bit and provide a more coherent user experience.
Sure…but 1 and 2 are already implemented by all of our interfaces. It’s obviously a win to do it consistent, but it is no win in terms of new functionality.