Stan on the GPU

Bob_Carpenter · May 3, 2017, 3:13am

As to your big question, the title of my post might give it away:

http://andrewgelman.com/2017/03/15/ensemble-methods-doomed-fail-high-dimensions/

This “100-fold speedup” thing is slippery. 100-fold speedups are achievable already using @wds15’s OpenMP parallelization of multiple ODE solver calls. That’s just parallelizing the likelihood in a Bayesian model.

If we need a lot of posterior draws (either because we want very high precision or we have very slow mixing), then after adaptation, we can perform those in an embarassingly parallel fashion.

There are some adaptation steps that I think we can parallelize.

Then if you look at something like Riemannian HMC, there are Hessian computations which are embarassingly parallel up to the number of parameters. These dominate RHMC’s compute time. And RHMC mixes amazingly well for hard problems, so this has some chance of making that tractable.

All the data (in the statistical sense) can sit on the GPU. But many of the operands are parameters, which vary draw to draw during MCMC.

We’re not 100% sure how much precision we need for accurate Hamiltonian simulations. We do know that it will vary by model. This could be explored without GPUs and I think it’d make a great project for someone. Matt Hoffman did a little bit in a post either here or on our old mailing list.

Topic		Replies	Views
ViennaCL with stan: Cholesky Benchmark Project Proposals stan-math	56	6308	July 23, 2018
Integrating GPU support Developers	23	5499	January 31, 2017
GPU Update: what's up and where we are going Developers features , math	29	2635	November 12, 2018
Stan on GPU: looking for model+dataset examples for empirical evaluation of speedups General	36	3476	March 5, 2018
CmdStan OpenCL GPU problems and wiki page Developers	59	1998	January 29, 2020

Stan on the GPU

Related topics