Hidden Markov Model benefits from GPU

I wasn’t sure which category was more appropriate to this question.

I am fitting a quite large Hidden Markov model through CmdStanPy on CPU. The model has more than 300 states, with more than 2000 data points. The transition matrix is sparse.

I am using the function hmm_marginal to increment the log probability.
I was wondering if a model of this kind could benefit from GPU computing or not. In general, I was wondering which are the bottlenecks of these models. Is it the transformed parameter block, where we define the transition matrix, or the sampling itself?

Thank you in advance for your help

Irene

The hmm_marginal function does not currently have a GPU implementation. @charlesm93 may be able to comment on if one would be beneficial and your other questions

Yes, I expect this model would benefit from support for sparse matrices on GPUs. Maybe @stevebronder has a sense of what it would take to implement this in stan-math. How substantial of an effort would this be, given the GPU support we currently have for matrices?

The sampling is dominated by the evaluation of the posterior gradient, which is itself (likely) dominated by operations on the transition kernel.