GPU GLMs: float or double

The issue of numerical reproducability is on two different levels here:

a) differences in implementation of floating point arithmetics
b) non-determinism due to parallel execution

Point a) is why we wont see the exact same results when comparing a x86 CPU and GPU results even if we only ran 1 thread on the GPU, event though both comply with the IEEE754 standard. They do differ in how they handle rounding. But in terms of reproducability on the exact same hardware this is not problematic. An extreme and useless case of running 1 thread on the GPU would produce the exact same result. Same goes for all GPUs that have the exact same handling of rounding which is at least all GPUs that fall in the same architecture family.

The far large “issue” is the different order of arithmetic operations and the non-determinsim of this order. If the algorithm has any sort of parallel reduction, even for a simple case where 10 threads perform an atomic add operation on a variable, you are not guaranteed to get the exact same results even between 2 runs on the same hardware. This is true for CPU multi-threading and GPUs. It is a bit more evident on GPUs as you have 1000s of threads, but the problem is the same.

But this is common to all scientific HPC applications and there is no way around it.

4 Likes