I implemented generalized linear models on GPU. Some discussion on GPU GLMS and graphs with speedups are in issue 1184. Then the idea came up, that GPU implementation can use `float`

calculations instead of `double`

to be even faster. As a prototype I implemented `poisson_log_glm_lpmf`

with `float`

s. Here are the speedups compared to implementation with `double`

s (K is number of attributes and N is number of instances):

An important thing to consder are also numerical errors due to reduced precision. So I compared both `float`

and `double`

implementations with CPU implementation. On next graph there are maximum relative errors among logposterior and all derivatives. I generated three test cases for each size. `y`

is generated between 0 and 100, all other inputs between -1 and 1.

Now the question is: Do we want GPU implementations of GLMs to use `float`

or `double`

?

EDIT: added version with kahan summation