HMM : how to specify the initial distribution

irelamb · March 31, 2024, 7:32pm

Hi,

I am fitting a Hidden Markov Model in Stan. I am specifying the transition matrix Gamma based on the parameters of my model, but I was wondering how I should define the initial probability vector rho: in principle I would choose a left eigenvector of Gamma with eigenvalue 1, hence a stationary distribution. But since Gamma is very large (but sparse) computing the eigenvectors at each iteration is costly, especially in terms of memory. I was wondering what happens if I specify an initial vector that is not a stationary distribution. Naively I thought it would just be like specifying a wrong prior for the inference…

martinmodrak · April 1, 2024, 10:39am

As any other choice in the model, you would need to refer to your domain knowledge to determine which distributions of initial states make sense. It is definitely not obvious to me that the stationary distribution is generally preferable. E.g. I’ve only ever used HMMs to model infectioys disease progression and there, the stationary distribution is that everybody’seither dead or fully cured - which would make little sense as initial state distribution.

Sometimes people just use a discrete uniform distribution - which is not great, but as long as your inferences are not sensitive to the true state in the couple first time points, the choice of initial distribution should not matter much.

You could also treat the initial distribution as a parameter to be estimated or even put some predictors on the initial state, but since the initial state tends to have little influence on the data (unless your HMM has very small transition probabilities), you are unlikely to learn much about it unless you observe a lot of individual series…

Does that answer your question?

irelamb · April 1, 2024, 12:04pm

Hi @martinmodrak, thank you so much for answering, despite the Easter break!
In my case I am actually modelling a subsequence of a long-running process that should indeed be at steady state, so I think that the stationary distribution of the transition matrix would be the natural choice, in this case. But I tried, just to see what happens, to use a discrete uniform distribution as you mention, and the inference looks reasonable on simulated data… I think I am in the case you describe, where the choice of the initial distribution does not matter much, and I think I don’t have enough individual series to learn accurately the initial state distribution…
Thank you so much, your answer was really helpful.

Topic		Replies	Views
Transversing up a graph (Hierarchical Hidden Markov Model) Modeling hmm	10	2675	July 24, 2017
Specify latent vector parameters in a hierarchical model Modeling	2	336	October 8, 2020
Hidden Markov Model with constraints Modeling specification , hmm	6	2634	August 24, 2017
Identifiability and convergence (Input-Output Hidden Markov Model) Modeling hmm	3	1828	July 23, 2017
Gaussian HMM does not converge and has large rhat Modeling rstan , fitting-issues	5	846	August 4, 2022

HMM : how to specify the initial distribution

Related Topics