This topic arose in advising a user seeking to accelerate their sampling using cloud services. At first I suggested they look into
reduce_sum(), but then remembered that the GPU crew implemented accelerators for the likelihood computation for GLMs. Do we have any intuitions or results yet on when it would be advantageous to use one over the other (presumably in the context of GLMs)?
I see from the GPU paper that they seem to max out at 10x speedups, so would the advice be as simple as: if the GPU is more expensive to rent than 10 cores, go with
reduce_sum()? Or does
reduce_sum() similarly have an intercept-and-diminishing-returns curve that needs to be taken into account (I’m thinking surely this is the case, as rarely do we get parallelism for free in computing).