I wasn’t sure which category was more appropriate to this question.

I am fitting a quite large Hidden Markov model through CmdStanPy on CPU. The model has more than 300 states, with more than 2000 data points. The transition matrix is sparse.

I am using the function `hmm_marginal`

to increment the log probability.

I was wondering if a model of this kind could benefit from GPU computing or not. In general, I was wondering which are the bottlenecks of these models. Is it the transformed parameter block, where we define the transition matrix, or the sampling itself?

Thank you in advance for your help

Irene

The `hmm_marginal `

function does not currently have a GPU implementation. @charlesm93 may be able to comment on if one would be beneficial and your other questions

1 Like

Yes, I expect this model would benefit from support for sparse matrices on GPUs. Maybe @stevebronder has a sense of what it would take to implement this in `stan-math`

. How substantial of an effort would this be, given the GPU support we currently have for matrices?

The sampling is dominated by the evaluation of the posterior gradient, which is itself (likely) dominated by operations on the transition kernel.

1 Like

Thank you both for your answers. Indeed I tried to run the model with OpenCL flags and GPU but without any improvement (actually, it was slower than on CPU only), which makes sense after what @WardBrian said.

I thought about writing the transition matrix as sparse matrix and increment the log density myself, without the use of `hmm_marginal`

(which does not support the sparse representation as far as I saw), but I didn’t do that yet… I am currently using Pathfinder on CPU for the same model, which is much faster (with all the limitations with respect to HMC…).

1 Like