I wasn’t sure which category was more appropriate to this question.

I am fitting a quite large Hidden Markov model through CmdStanPy on CPU. The model has more than 300 states, with more than 2000 data points. The transition matrix is sparse.

I am using the function `hmm_marginal`

to increment the log probability.

I was wondering if a model of this kind could benefit from GPU computing or not. In general, I was wondering which are the bottlenecks of these models. Is it the transformed parameter block, where we define the transition matrix, or the sampling itself?

Thank you in advance for your help

Irene