HMM : how to specify the initial distribution


I am fitting a Hidden Markov Model in Stan. I am specifying the transition matrix Gamma based on the parameters of my model, but I was wondering how I should define the initial probability vector rho: in principle I would choose a left eigenvector of Gamma with eigenvalue 1, hence a stationary distribution. But since Gamma is very large (but sparse) computing the eigenvectors at each iteration is costly, especially in terms of memory. I was wondering what happens if I specify an initial vector that is not a stationary distribution. Naively I thought it would just be like specifying a wrong prior for the inference…

1 Like

As any other choice in the model, you would need to refer to your domain knowledge to determine which distributions of initial states make sense. It is definitely not obvious to me that the stationary distribution is generally preferable. E.g. I’ve only ever used HMMs to model infectioys disease progression and there, the stationary distribution is that everybody’seither dead or fully cured - which would make little sense as initial state distribution.

Sometimes people just use a discrete uniform distribution - which is not great, but as long as your inferences are not sensitive to the true state in the couple first time points, the choice of initial distribution should not matter much.

You could also treat the initial distribution as a parameter to be estimated or even put some predictors on the initial state, but since the initial state tends to have little influence on the data (unless your HMM has very small transition probabilities), you are unlikely to learn much about it unless you observe a lot of individual series…

Does that answer your question?

1 Like

Hi @martinmodrak, thank you so much for answering, despite the Easter break!
In my case I am actually modelling a subsequence of a long-running process that should indeed be at steady state, so I think that the stationary distribution of the transition matrix would be the natural choice, in this case. But I tried, just to see what happens, to use a discrete uniform distribution as you mention, and the inference looks reasonable on simulated data… I think I am in the case you describe, where the choice of the initial distribution does not matter much, and I think I don’t have enough individual series to learn accurately the initial state distribution…
Thank you so much, your answer was really helpful.

1 Like