Any intuitions/results on when GPU>reduce_sum?

What happens if you combine reduce_sum with gpu support? Will it offload matrix calculations to the GPU within threads? Or does that just get turned off?