Hi,

I am fitting a Hidden Markov Model in Stan. I am specifying the transition matrix `Gamma`

based on the parameters of my model, but I was wondering how I should define the initial probability vector `rho`

: in principle I would choose a left eigenvector of `Gamma`

with eigenvalue 1, hence a stationary distribution. But since `Gamma`

is very large (but sparse) computing the eigenvectors at each iteration is costly, especially in terms of memory. I was wondering what happens if I specify an initial vector that is not a stationary distribution. Naively I thought it would just be like specifying a wrong prior for the inferenceâ€¦

1 Like

As any other choice in the model, you would need to refer to your domain knowledge to determine which distributions of initial states make sense. It is definitely not obvious to me that the stationary distribution is generally preferable. E.g. Iâ€™ve only ever used HMMs to model infectioys disease progression and there, the stationary distribution is that everybodyâ€™seither dead or fully cured - which would make little sense as initial state distribution.

Sometimes people just use a discrete uniform distribution - which is not great, but as long as your inferences are not sensitive to the true state in the couple first time points, the choice of initial distribution should not matter much.

You could also treat the initial distribution as a parameter to be estimated or even put some predictors on the initial state, but since the initial state tends to have little influence on the data (unless your HMM has very small transition probabilities), you are unlikely to learn much about it unless you observe a lot of individual seriesâ€¦

Does that answer your question?

1 Like

Hi @martinmodrak, thank you so much for answering, despite the Easter break!

In my case I am actually modelling a subsequence of a long-running process that should indeed be at steady state, so I think that the stationary distribution of the transition matrix would be the natural choice, in this case. But I tried, just to see what happens, to use a discrete uniform distribution as you mention, and the inference looks reasonable on simulated dataâ€¦ I think I am in the case you describe, where the choice of the initial distribution does not matter much, and I think I donâ€™t have enough individual series to learn accurately the initial state distributionâ€¦

Thank you so much, your answer was really helpful.

1 Like