Model
This is my understanding of the proposed model, please correct me if I’m wrong.
- We’re assuming univariate emissions.
- We’re allowing for time-varying transition probabilities (input-driven transitions for the hidden states).
- We’re also allowing for one user-chosen emission density.
- No multivariate emissions.
- No state-dependent density. For example, we rule out cases where the emissions follow one distribution in one state and another distribution (say, fatter tails) in other states.
Questions
- I’d personally start with a softmax regression with soft centering, but I’d also assume that the K-1 parametrization may make convergence way easier. Would it be reasonably easy to change this parametrization later?
- I’m not following the ragged sequences bit. I do understand that Stan has no built-in ragged structure, but I’m not sure why we’d need it.
- Assuming the emission array is univariate, will the emission function really take T1, …, TN parameters? Say we want to fit a model with two states and Gaussian density, wouldn’t we not need only two means and two standard deviations? If the latter is true, then there’s no fixed number of density parameters.
- What happens if we need to fit a time-homogeneous HMM (i.e. no covariates for the transition model)? Using a constant input would feel hacky and computationally inefficient. Would we need to implement two interfaces (perhaps same name but different number of arguments, assuming we can overload functions)?
- Is there a straightforward way to allow for regressors in the emission mean? Think for example of an Markov-Switching AutoRegressive model for time series (i.e. AR1 with a pair of regression coefficients for each hidden states). Does the proposed interface accept non-parameters for alpha1, …, alphaN? If so, I think we may use these quantities to model emission means when needed (e.g. instead of being free parameters, these could be a linear combination of some elements of the data block and some free parameters).
Inference
I believe the main goal of having HMM built-in is to gain performance, mostly by avoiding autodifferentiation. As such, I’d stick with the forward algorithm since adding the forward quantity to the target is enough to get the sampler working. The forward-backward pass is optional and can be programmed in the generated quantities
block. Same with Viterbi decoding.
Don’t get me wrong, I like smoothed probabilities a lot, but I think the first step should be getting the right posterior. After that, we can provide with convenience functions to compute the other optional quantities (e.g. an overloaded instance of hmm_lpdf
with an extra argument where we store the smoothed probability or the jointly most likely path).
Finally, @Bob_Carpenter: I assume we’d need to implement this built-in density as a “template”. Is there a template for another model that you’d recommend us to take a look at? I was considering taking a loot at the GP work but you may know of another built-in model that is more or less similar than our use case.