ChEES-HMC Why has it not been mentioned?

Cynon · March 14, 2024, 9:31pm

I assume that much higher performance is a great thing and this algorithm (currently seems available in the TFP package) can run orders of magnitude more efficiently on a gpu.

In terms of choice, I prefer the helpfulness of stan, but if I can run 100 chains on a GPU with ChEES it really turns my head with respect to large model runtimes being reduced from hours to minutes.

I wondered if anyone had previously brought this up at all?

avehtari · March 15, 2024, 1:14pm

Hi,
Stan development team is well aware of ChEES and other methods developed by Matt Hoffman. It’s not easy to add ChEES or other GPU targeted inference algorithms to Stan so that they would run on GPU, but luckily there are other backends.

andre.pfeuffer · March 17, 2024, 5:19am

Does this also apply for CPUs? The numbers on CPUs are growing. This might be an option?

avehtari · March 18, 2024, 10:02am

To get the best benefit from GPUs, the parallel computations need to have the same operations. In case of HMC, it means that every chain needs to have the same number of operations in dynamic simulation. ChEES-HMC targets for that. CPUs don’t have that limitation, so you can just run many NUTS-HMC chains in parallel. There are other differences between the algorithms, and I expect that if someone would make a design doc and PR for ChEES-HMC it could get in to Stan. As the scaling with dimensionality is the same and speed differences with CPUs would not be big, this is not on priority list of current active developers as far as I know. I’m not one of those that could add a new algorithm to Stan, but I think delayed rejection HMC (adaptive step size improves performance in case of funnels) or microcanonical HMC (has better scaling with respect to dimensionality) would be more interesting to get in Stan than ChEES-HMC. All these are just my personal opinions.

Cynon · March 19, 2024, 10:34pm

Thanks Aki,

I will look at the other algorithms you suggested. Particularly delayed rejection HMC for all my funnel problems. My interest is the amazing speed increases available from a GPU based HMC implementation that takes advantage of the GPU capabilities. I noted that ChEES seemed much more able than NUTS in terms of wall time per independent sample given that it could be executed wholly on GPU. This seemed such a significant advantage over Stan and NUTS, so I was determined to find out more.

My contention was that for BRMS based models that typically use a heirarchical structure and non-centered priors, then a GPU based backend would be superior through the use of stan2tfp and ChEES-HMC.

I wondered where to find out more and thought this forum might have something, but there was no mention on ChEES so I thought I would ask about it.

I hope, from your comments, that there are others that have tried these ideas and I might get further insight.

Topic		Replies	Views
Stanc3 optimization and analyses walkthrough during StanCon Meetings	6	1073	August 22, 2019
Within-chain parallelization idea (maybe crazy) Developers	35	2780	February 24, 2022
Stan on the GPU Project Proposals	16	8495	August 10, 2018
Tried to Speed Up Stan (HMC) by Rewriting Dynamics, failed Algorithms	2	677	June 17, 2020
Measure HMC Time Usage General	3	467	October 23, 2019

ChEES-HMC Why has it not been mentioned?

Related topics