GPU GLMs: float or double

rok_cesnovar · July 18, 2019, 8:25am

The issue of numerical reproducability is on two different levels here:

a) differences in implementation of floating point arithmetics
b) non-determinism due to parallel execution

Point a) is why we wont see the exact same results when comparing a x86 CPU and GPU results even if we only ran 1 thread on the GPU, event though both comply with the IEEE754 standard. They do differ in how they handle rounding. But in terms of reproducability on the exact same hardware this is not problematic. An extreme and useless case of running 1 thread on the GPU would produce the exact same result. Same goes for all GPUs that have the exact same handling of rounding which is at least all GPUs that fall in the same architecture family.

The far large “issue” is the different order of arithmetic operations and the non-determinsim of this order. If the algorithm has any sort of parallel reduction, even for a simple case where 10 threads perform an atomic add operation on a variable, you are not guaranteed to get the exact same results even between 2 runs on the same hardware. This is true for CPU multi-threading and GPUs. It is a bit more evident on GPUs as you have 1000s of threads, but the problem is the same.

But this is common to all scientific HPC applications and there is no way around it.

Topic		Replies	Views
Stan on the GPU Project Proposals	16	8520	August 10, 2018
GPU Update: what's up and where we are going Developers features , math	29	2587	November 12, 2018
Integrating GPU support Developers	23	5475	January 31, 2017
CmdStan OpenCL GPU problems and wiki page Developers	59	1970	January 29, 2020
Stan on GPU: looking for model+dataset examples for empirical evaluation of speedups General	36	3443	March 5, 2018

GPU GLMs: float or double

Related topics