Iterative (incremental) updating of multinomial HMM for each new observation

mshin · January 21, 2022, 7:26am

Hi,

The problem that I am dealing with is iterative (incremental) updating, or online learning of multinomial HMM (i.e., states and outcomes are all discrete) for each new observation. Specifically, I want to keep track of the changes in the model’s estimates of state transition probabilities and emission probabilities for every new observation, when the true parameters are fixed.

There are several online estimation algorithms using EM approach, such as Online Learning with Hidden Markov Models, but I believe none exists in Bayesian paradigm.

One possible solution that I am thinking of is fitting HMM iteratively for every new observation.
For example, if I have 100 observations from a multinomial HMM sequence with fixed parameters, I may fit the model 100 times using first y_1, then [y_1, y_2], then, [y_1, y_2, y_3], …, and finally [y_1, ..., y_{100}]. Then, it would be possible to achieve estimated state transition probability at time t , A^t, and estimated emission probability at time t, B^t.

What are your thoughts about this?
Do you think it is doable, or should other approaches be adopted?

p.s. I also have a question related to non-identifiability of HMM in this setting, and it was covered in my other question: Manual ordering of two simplexes for emission probabilities in multinomial HMM.

Thank you in advance,
Minho

Bob_Carpenter · February 1, 2022, 11:20pm

This is what we’ve done in the past as it gives you the right answer.

This isn’t a problem for Bayes per se, just implementations of it using named distributions. In pure Bayes theory, the posterior after each data item can be used as the prior for the next. The problem is that in practice, there’s no way to represent those posteriors other than through sampling.

One algorithm that does this directly is sequential Monte Carlo (SMC), which Stan doesn’t support. @s.maskell is working on building SMC algorithms for Stan with better incremental proposals than the standard Metropolis, but I have no idea what the ETA is for working code.

If you are fitting a sequence of models, you might want to think about hot-starting each fit at the parameter values and adaptation values of the previous model. It should make warmup go faster if it’s a bottleneck.

s.maskell · February 2, 2022, 7:28am

@mshin: the work we are doing has been more focused on cases where the parameter of interest (eg the state transition probabilities) are themselves time-varying. Your problem is different but one that we have considered in the past (eg see here: https://livrepository.liverpool.ac.uk/3003664/1/ISMA_2016.pdf). The core idea we have explored is to use importance sampling to use samples from one posterior to provide estimates with respect to other posteriors (eg those involving a few new datapoints). Unfortunately, considering weighted samples in Stan is going to require a lot of thought, work and discussion. So, I doubt that’s going to be a core part of Stan any time soon.

@Bob_Carpenter: thanks for tagging me. I do wonder whether it would be worth my team writing a little post-processing tool for Stan that enabled users to define two models, sample (with Stan) from one and then use importance sampling to generate (weighted) samples with the second model as the target. I think @alphillips might have actually done this already, so it may just be a case of getting the code out there. It would help us think through how to use weighted samples in general (with an eye to Streaming-Stan and SMC samplers for (non-Streaming) Stan) which would seem prudent. As to the timeline for those developments, we’re hoping to get Streaming-Stan released over the next six months, but need to ensure we have all the ducks aligned beforehand: in particular, we want to get a write-up of the algorithm we are using finished and that’s turning out to be a bit more involved than we anticipated.

mshin · February 3, 2022, 6:39am

Thank you again, @Bob_Carpenter for your valuable advice!

Thanks for clarification!

Great! This would save me a huge amount of time :)

Also, thanks @s.maskell for working on Streaming-SMC. Can’t wait it to happen!

Thank you for the recommendation. I will definitely look into it.

It is unfortunate… Then do you know any alternative implementation? For example:

s.maskell · February 3, 2022, 7:23am

Sorry, I should have been clearer: I think we can get implementations of things that work (but are “unofficial”) out relatively relatively quickly, but making the importance sampling post-processing code and Streaming-Stan (see here for a description from StanCon2020: StanCon 2020. Talk 5: Simon Maskell. Towards an Interface for Streaming Stan - YouTube) fully supported and integral components of the Stan ecosystem is what will require a longer wait.

The delay for the importance sampling post-processing code is a software issue that we now know how to solve, but just need to devote a little time to. I think that’ll get resolved pretty quickly given that we know there is at least one more person that we thought who is interested (thanks!). In terms of Streaming-Stan, we want to write up our algorithm and that paper’s not finalised yet.

I’m aware of a few efforts to make generic SMC samplers and particle filters accessible to users (eg the Julia library you linked to), but I’ve not seen anyone else working towards something that is likely to work well generically: for me, NUTS is the key thing that means that Stan is useful and, while we need something similar for dynamic models, the other developments I’ve seen have adopted approaches that I’d expect to work well in certain settings but not in all settings.

mshin · February 3, 2022, 7:36am

Oh, now I get it.

That’s great! I really look forward to it coming!!

I see that NUTS could also be beneficial for SMC samplers.

Thank you so much for making Stan more useful, @s.maskell!

alphillips · February 16, 2022, 11:35pm

Apologies, I didn’t spot the notification that I’d been tagged in this thread, @s.maskell is right, I have written a little R package to calculate importance weights for models in Rstan which may be useful (though it is untested, so use with caution): GitHub - codatmo/stanIncrementalImportanceSampling: Basic R package to perform importance sampling to update fitted Stan models with new data

mshin · February 18, 2022, 7:54am

Hi, @alphillips!

Thanks for the info!
I will definitely take a look at the package you linked!!

Topic		Replies	Views
Fitting a Hidden Markov model with hierarchical emission parameters Modeling hmm	2	1893	July 28, 2017
Manual ordering of two simplexes for emission probabilities in multinomial HMM Modeling	7	1237	February 8, 2022
Hidden Markov Model with constraints Modeling specification , hmm	6	2665	August 24, 2017
Problem in fitting multinomial HMM Modeling	0	401	June 14, 2022
Identifiability and convergence (Input-Output Hidden Markov Model) Modeling hmm	3	1929	July 23, 2017

Iterative (incremental) updating of multinomial HMM for each new observation

Related topics