ChEES-HMC Why has it not been mentioned?

I assume that much higher performance is a great thing and this algorithm (currently seems available in the TFP package) can run orders of magnitude more efficiently on a gpu.

In terms of choice, I prefer the helpfulness of stan, but if I can run 100 chains on a GPU with ChEES it really turns my head with respect to large model runtimes being reduced from hours to minutes.

I wondered if anyone had previously brought this up at all?

Hi,
Stan development team is well aware of ChEES and other methods developed by Matt Hoffman. It’s not easy to add ChEES or other GPU targeted inference algorithms to Stan so that they would run on GPU, but luckily there are other backends.

1 Like

Does this also apply for CPUs? The numbers on CPUs are growing. This might be an option?

To get the best benefit from GPUs, the parallel computations need to have the same operations. In case of HMC, it means that every chain needs to have the same number of operations in dynamic simulation. ChEES-HMC targets for that. CPUs don’t have that limitation, so you can just run many NUTS-HMC chains in parallel. There are other differences between the algorithms, and I expect that if someone would make a design doc and PR for ChEES-HMC it could get in to Stan. As the scaling with dimensionality is the same and speed differences with CPUs would not be big, this is not on priority list of current active developers as far as I know. I’m not one of those that could add a new algorithm to Stan, but I think delayed rejection HMC (adaptive step size improves performance in case of funnels) or microcanonical HMC (has better scaling with respect to dimensionality) would be more interesting to get in Stan than ChEES-HMC. All these are just my personal opinions.

2 Likes

Thanks Aki,

I will look at the other algorithms you suggested. Particularly delayed rejection HMC for all my funnel problems. My interest is the amazing speed increases available from a GPU based HMC implementation that takes advantage of the GPU capabilities. I noted that ChEES seemed much more able than NUTS in terms of wall time per independent sample given that it could be executed wholly on GPU. This seemed such a significant advantage over Stan and NUTS, so I was determined to find out more.

My contention was that for BRMS based models that typically use a heirarchical structure and non-centered priors, then a GPU based backend would be superior through the use of stan2tfp and ChEES-HMC.

I wondered where to find out more and thought this forum might have something, but there was no mention on ChEES so I thought I would ask about it.

I hope, from your comments, that there are others that have tried these ideas and I might get further insight.

1 Like