HMM parallelization with map-reduce?

irelamb · March 25, 2024, 9:18pm

Hello,

(CmdStanPy) I am fitting a certain number of intensity traces, that evolve over time, using a hidden Markov model with transition probabilities and emission probabilities that depend on some shared parameters (shared among all traces). ntraces correspond to the total number of traces (of different lengths), and ndata is the total number of time points (the sum of the traces lengths). For simplicity, I define the log_omega matrix of the log of the emission probabilities as n_states x ndata, where n_states is the total number of hidden states. My model block looks as follows

model {
  for (p in 1 : ntraces){
        target += hmm_marginal(log_omega[1 : n_states, starts[p] : ends[p]], Gamma, rho);
  }
}

where starts[p] and ends[p] is the start and end of the trace p. The transition probability Gamma has dimensions n_states x n_states, and rho is a vector of length n_states. I was wondering if I could use map_reduce to parallelize the sum that is done incrementally with the loop. I am not sure because of the signature of the map_rect function that does not allow for matrices, for example. what do you think?

Thank you in advance!!!

Bob_Carpenter · March 25, 2024, 9:25pm

I’m afraid the only solution is to serialize the matrix to an array and then unpack it manually on the other side (being very careful about row-vs.-column order). The same sort of thing has to happen when there is ragged data.

irelamb · March 26, 2024, 6:27am

Thank you Bob. Do you think it could be beneficial in terms of efficiency?

irelamb · March 27, 2024, 8:15pm

In particular, I see that Stan spends most of the time in the warm-up phase:

Elapsed Time: 49212.2 seconds (Warm-up)

3033.38 seconds (Sampling)

52245.6 seconds (Total)

Could map-reduce be useful in this case?
Or should I provide initial values and reduce the warm-up?

Topic		Replies	Views
Log likelihood computation parallelization with Reduce_sum and map_rect in Cmdstan CmdStan techniques , specification , performance , cmdstanr	0	425	October 13, 2022
Catastrophic performance drop with map_rect Modeling paralellization	4	582	May 9, 2023
Map_rect causing substantial slowdown; trying to understand how to fix Modeling cmdstanr , paralellization	13	963	July 24, 2020
Issues with arrays, matrices, and reduce_sum for Bayesian PCA General	6	512	December 19, 2023
Parallel v3 map Developers	26	1123	February 2, 2020

HMM parallelization with map-reduce?

Related topics