I’m currently using R, Stan, and specifically the functions extract_sparse_parts() and csr_matrix_times_vector(), to take advantage of a having a 98.7% sparse matrix in a hierarchical linear regression model. I have also seen this regarding GPUs.

Is there currently a way to take advantage of both GPUs and sparse matrices? If so, is there an example posted anywhere? If not, and I can only choose either GPU or sparse, any idea which would be faster?

The only things planned for GPUs in the immediate future are dense operations. There are separate developments to support sparse matrices more in the Stan language. So, for now, using csr_matrix_times_vector() is about it.

With the sparsity you mention I would guess sparse CPU would be faster, especially if you matrix is big as there will be some overhead to transferring the matrix to CPU (memory transfer speed is often the bottleneck in GPU performance).