HMM parallelization with map-reduce?


(CmdStanPy) I am fitting a certain number of intensity traces, that evolve over time, using a hidden Markov model with transition probabilities and emission probabilities that depend on some shared parameters (shared among all traces). ntraces correspond to the total number of traces (of different lengths), and ndata is the total number of time points (the sum of the traces lengths). For simplicity, I define the log_omega matrix of the log of the emission probabilities as n_states x ndata, where n_states is the total number of hidden states. My model block looks as follows

model {
  for (p in 1 : ntraces){
        target += hmm_marginal(log_omega[1 : n_states, starts[p] : ends[p]], Gamma, rho);

where starts[p] and ends[p] is the start and end of the trace p. The transition probability Gamma has dimensions n_states x n_states, and rho is a vector of length n_states. I was wondering if I could use map_reduce to parallelize the sum that is done incrementally with the loop. I am not sure because of the signature of the map_rect function that does not allow for matrices, for example. what do you think?

Thank you in advance!!!

Iā€™m afraid the only solution is to serialize the matrix to an array and then unpack it manually on the other side (being very careful about row-vs.-column order). The same sort of thing has to happen when there is ragged data.

1 Like

Thank you Bob. Do you think it could be beneficial in terms of efficiency?

In particular, I see that Stan spends most of the time in the warm-up phase:

Elapsed Time: 49212.2 seconds (Warm-up)

3033.38 seconds (Sampling)

52245.6 seconds (Total)

Could map-reduce be useful in this case?
Or should I provide initial values and reduce the warm-up?