The problem that I am dealing with is iterative (incremental) updating, or online learning of multinomial HMM (i.e., states and outcomes are all discrete) for each new observation. Specifically, I want to keep track of the changes in the model’s estimates of state transition probabilities and emission probabilities for every new observation, when the true parameters are fixed.
There are several online estimation algorithms using EM approach, such as Online Learning with Hidden Markov Models, but I believe none exists in Bayesian paradigm.
One possible solution that I am thinking of is fitting HMM iteratively for every new observation.
For example, if I have 100 observations from a multinomial HMM sequence with fixed parameters, I may fit the model 100 times using first y_1, then [y_1, y_2], then, [y_1, y_2, y_3], …, and finally [y_1, ..., y_{100}]. Then, it would be possible to achieve estimated state transition probability at time t , A^t, and estimated emission probability at time t, B^t.
What are your thoughts about this? Do you think it is doable, or should other approaches be adopted?
This is what we’ve done in the past as it gives you the right answer.
This isn’t a problem for Bayes per se, just implementations of it using named distributions. In pure Bayes theory, the posterior after each data item can be used as the prior for the next. The problem is that in practice, there’s no way to represent those posteriors other than through sampling.
One algorithm that does this directly is sequential Monte Carlo (SMC), which Stan doesn’t support. @s.maskell is working on building SMC algorithms for Stan with better incremental proposals than the standard Metropolis, but I have no idea what the ETA is for working code.
If you are fitting a sequence of models, you might want to think about hot-starting each fit at the parameter values and adaptation values of the previous model. It should make warmup go faster if it’s a bottleneck.
@mshin: the work we are doing has been more focused on cases where the parameter of interest (eg the state transition probabilities) are themselves time-varying. Your problem is different but one that we have considered in the past (eg see here: https://livrepository.liverpool.ac.uk/3003664/1/ISMA_2016.pdf). The core idea we have explored is to use importance sampling to use samples from one posterior to provide estimates with respect to other posteriors (eg those involving a few new datapoints). Unfortunately, considering weighted samples in Stan is going to require a lot of thought, work and discussion. So, I doubt that’s going to be a core part of Stan any time soon.
@Bob_Carpenter: thanks for tagging me. I do wonder whether it would be worth my team writing a little post-processing tool for Stan that enabled users to define two models, sample (with Stan) from one and then use importance sampling to generate (weighted) samples with the second model as the target. I think @alphillips might have actually done this already, so it may just be a case of getting the code out there. It would help us think through how to use weighted samples in general (with an eye to Streaming-Stan and SMC samplers for (non-Streaming) Stan) which would seem prudent. As to the timeline for those developments, we’re hoping to get Streaming-Stan released over the next six months, but need to ensure we have all the ducks aligned beforehand: in particular, we want to get a write-up of the algorithm we are using finished and that’s turning out to be a bit more involved than we anticipated.
Sorry, I should have been clearer: I think we can get implementations of things that work (but are “unofficial”) out relatively relatively quickly, but making the importance sampling post-processing code and Streaming-Stan (see here for a description from StanCon2020: StanCon 2020. Talk 5: Simon Maskell. Towards an Interface for Streaming Stan - YouTube) fully supported and integral components of the Stan ecosystem is what will require a longer wait.
The delay for the importance sampling post-processing code is a software issue that we now know how to solve, but just need to devote a little time to. I think that’ll get resolved pretty quickly given that we know there is at least one more person that we thought who is interested (thanks!). In terms of Streaming-Stan, we want to write up our algorithm and that paper’s not finalised yet.
I’m aware of a few efforts to make generic SMC samplers and particle filters accessible to users (eg the Julia library you linked to), but I’ve not seen anyone else working towards something that is likely to work well generically: for me, NUTS is the key thing that means that Stan is useful and, while we need something similar for dynamic models, the other developments I’ve seen have adopted approaches that I’d expect to work well in certain settings but not in all settings.