Examples for hidden Markov models with longitudinal data

Hi, is there any example for hidden Markov models on longitudinal data (many subjects, relative small, 8-20, time-sequenced observations for each subject) in stan? I have learned and worked out hierarchical multi-level models. I would like to apply hidden Markov models on the similar kind of data. Thanks.

1 Like

Tagging @mbjoseph, @vianeylb and @betanalpha.

I can’t think of a specific example, but mostly because I tend to work with ecological data from animals.

@martinmodrak is working on a project like this now, some details of a previous model formulation here: Reanalyzing the Gautret et al. data on Covid-19 treatment. Asking for feedback

@LuisDamiano has a writeup on HMMs in Stan: https://zenodo.org/record/1284341/files/main_pdf.pdf?download=1 as well as an R package that could be helpful, BayesHMM: https://luisdamiano.github.io/BayesHMM/articles/introduction.html#introduction

The Stan manual has examples on how to implement hidden Markov models as well. I know there are various types of HMM formulations applied to longitudinal data, with both discrete- and/or continuous-valued random effects, and these can all be easily incorporated using Stan.


Hi, thank you for the suggestions. I will take a look.

I am trying to see how stan can combine HMM with the “random intercepts/slope” to account for heterogeneity in longitudinal data. I have seen those works, but usually without accompanying codes. I wish borrow some stan codes.

BTW, I just heard your recent podcast on Learning Bayesian Statistics and I found your tutorial paper with stan code. It is very helpful, especially with the decoding! Thank you very much.

I am trying to see how stan can combine HMM with the “random intercepts/slope” to account for heterogeneity in longitudinal data. I have seen those works, but usually without accompanying codes. I wish borrow some stan codes.

Ah, I see! Random intercepts/slope in the distributions of the observations or in the transition probability matrix, or both? I could put together a small example depending on what you need for a discrete-time HMM.

@martinmodrak is working on a continuous-time HMM, which I have yet to implement myself, but could be quite useful if your data are better suited to this type of representation.

There are two papers that come to mind for mixed HMMs:

Hope these are helpful.


Thanks for the offer! My dataset is in discrete-time. It has time-varying predictors in each time period. The outcome is actually a count, but I can deal with it as binary if needed.
I can see that the “random intercepts/slope” can appear in both the state-dependent parameters and the transition probability matrix. If you have time to provide a small example, it would be much appreciated.

I have read the papers by Dr. Altman and Dr. Maruotti. Both are great in pushing mixed HMMs forward.


I have an example here with continuous-valued random effects in the state-dependent distributions: https://github.com/vianeylb/WhiteSharkGuadalupe/blob/master/WhiteShark_StepTurn_RandEff.stan

The only hierarchical part is in the means of the state-dependent distributions.

\mu_{ni} \sim N(\mu_n, \tau_n)

In the code I include the number of individuals (NindivID) and a vector indicating which observations corresponds to which shark indivID in the data block.

  //hierarchical steps
  int<lower=1> NindivID; // individual IDs
  int<lower=1, upper=NindivID> indivID[Tlen]; // individual IDs

In the parameter block I include

positive_ordered[N_gps] mu_step[NindivID];

so that Stan knows that there are individual-specific means for each individual. This can easily be extended into the transition probability matrix as well.

Hope that helps get you started!


I’ll just add that although I am exploiting a continuous time formulation of the process, my data is also in discrete time, it just turns out that thinking in continuous time is useful when there you can assume that there are only few underlying “basic” transitions (e.g. your states are ordered and you assume the physical process moves only between neighboring states), but your data are coarse, so that you expect that multiple such “basic” transitions can occur between your individual observations.

In the example with N ordered states, you can very sensibly parametrize the whole N \times N transition matrix with only 2 (N - 1) parameters representing the transition rates between the neighbouring states in continous time. You can then put linear predictors on those “basic” transition rates, greatly reducing the number of predictors you need (given that the structure is appropriate).

The math behind this idea is given in a paper by Williams et al.: https://www.tandfonline.com/doi/abs/10.1080/01621459.2019.1594831

1 Like

Thank you very much.

I agree with everything you said. I will need to think about how to do this once I figure out the example provided by Dr. Leos-Barajas.

My only thought is that since I work with longitudinal data, with relatively shorter periods of data (maybe 2 years with 24 months), it is harder to imagine or justify needing too many hidden states. It is less necessary to worry about the issue you brought up, which is totally valid and a great suggestion. Thanks again.

1 Like