How can hardware help Stan?

andrewgelman · August 14, 2020, 1:34pm

We’ve been talking about the possibilities of specialized hardware for Stan. There’s parallel processing and GPU’s. Are there other examples? There’s specialized hardware for deep learning that, for example, make use of known patterns of data reuse. The idea is that there could be chips for Bayes/HMC/Nuts/etc that would enable efficient computation. Maybe focused on hierarchical models, or maybe focused particularly on autodiff and HMC, or maybe focused on probabilistic programming and the handling of uncertainty?

Any ideas would be appreciated.

breckbaldwin · August 14, 2020, 9:16pm

Elaborating on Andrew’s post a bit, we are looking at getting research funds to create hardware that addresses speed/scaling issues in Stan. In no particular order it would help to have info about the following with an eye to being amenable to a hardware solution perhaps with algorithm changes as well:

Current processing bottlenecks for all models/data.
Processing bottlenecks for idiosyncratic models/data.
Scale sensitive issues that grow super linearly.
Right now we are focused on HMC/NUTS as our inference algorithm but this need not be the case.

I realize this is vague but we are looking to identify where we might make progress with custom hardware.

thanks

Breck

Bob_Carpenter · August 19, 2020, 6:53pm

You have to be clear on what you want to speed up. MCMC is embarassingly parallelizable to achieve high ESS/wall time unit. It’s not so parallelizable to get to ESS = 100 quickly.

Why custom hardware rather than existing hardware like TPUs or FPGAs?

What makes GPUs and TPUs possible is that the tensor operations to which they’re applied are massively SIMD.

Ask a hardware person what’s possible without SIMD or with a different kind of SIMD than available on GPU or TPU, and have someone on hand who can understand the answer. Don’t you and Tamara have such a person on your grantwriting team?

On the receiving end, you’ll need someone who understands current GPU and CPU architecture, Stan’s autodiff, and Stan’s sampling and optimization algorithms. I’d ask the folks who worked on GPUs like @stevebronder, @seantalts, and @rok_cesnovar.

It sort of does if we want to do full Bayes in high dimensions. There isn’t anything else competitive. It’s not like we can make Gibbs or Metropolis or ensemble methods faster and succeed.

andrewgelman · August 19, 2020, 8:06pm

Ask a hardware person what’s possible without SIMD or with a different kind of SIMD than available on GPU or TPU, and have someone on hand who can understand the answer. Don’t you and Tamara have such a person on your grantwriting team?

We don’t have a grantwriting team, but two of our co-PI’s, Michael Carbin and Vivienne Sze, know a lot about hardware. We were just posting on the Stan forums to get some other perspectives.

On the receiving end, you’ll need someone who understands current GPU and CPU architecture, Stan’s autodiff, and Stan’s sampling and optimization algorithms. I’d ask the folks who worked on GPUs like @stevebronder, @seantalts, and @rok_cesnovar.

breckbaldwin:

Right now we are focused on HMC/NUTS as our inference algorithm but this need not be the case.

It sort of does if we want to do full Bayes in high dimensions. There isn’t anything else competitive. It’s not like we can make Gibbs or Metropolis or ensemble methods faster and succeed.

We have some ideas that can use HMC/NUTS but are not themselves HMC/NUTS. For example, EP. Conversely, there’s the idea that HMC/NUTS can be helpful in improving approximate algorithms.

maedoc · August 19, 2020, 8:25pm

There are some FPGA toolchains which take OpenCL as input. That provides a short term proof of concept since Stan is already using OpenCL and FPGAs are good for prototyping. In the longer term, being able to unroll the whole gradient computation onto an FPGA would be “game changing” fast since there’s no more pointer chasing.

Topic		Replies	Views
Hardware Advice General	21	3086	October 8, 2020
Buying a new computer - best hardware for fast Stan performance General	24	7446	May 21, 2020
Hardware for the second quarter of this century General	4	230	November 19, 2024
Stan on the GPU Project Proposals	16	8497	August 10, 2018
Stan on GPU: looking for model+dataset examples for empirical evaluation of speedups General	36	3399	March 5, 2018

How can hardware help Stan?

Related topics