Can latent transition models be implemented in Stan?



Hey All,

I am trying to help a friend with a complex modelling problem and was thinking Stan might be suitable, but I am unsure where to start and was hoping someone here might have done something similar in the past. My friend has a large amount of patient data involving clinical psychological measures (e.g., questions related to specific symptoms) gathered both before and after a psychological intervention in the same subjects. She believes there to be latent states within these data – representing different phases of the disorder (during the initial or final time point) or recovery (during the final time point). She would like to estimate the probability of starting in any given state, the probability of transitioning from one state to another and also to predict Time 2 states based on Time 1 data. She has 9 observed variables at each of the two time points (each representing answers to questions on a 4-point ordinal scale). This sounds to me like a latent transition model, and she has been pursuing that approach using the LMest package in R but has come up against various limitations regarding the model implementation.

I believe this situation could be modelled using a variant of the HMMs presented in 10.6 of the manual but I am not sure how to incorporate the multivariate ordinal nature of the observations at each time point. Does anyone have a similar model they would be willing to share? Or am I completely wrong in my thinking?



If there’s just two time points, I wouldn’t worry about an HMM. I think the advantage of HMMs is that it’s possible to scale them across long time series.

I dunno a thing about surveys, but my gut feeling here is just build a model for each survey (I assume they’re the same questions with the same types of covariates attached?), and then just put hierarchical priors on the parameters between the two time steps.

You’re looking for change, so use the hierarchical parameters to determine how different the parameters are between the beginning and end (like in the 8 schools example, if you have a copy of BDA3).

I dunno about doing a clustering/states thing here. I know people talk about them in medical applications, but it seems like you’re throwing data away when you start trying to lump people in categories, even if you keep track of distributions over those categories. I might be giving bad advice with this though – have a look around for other survey models on the forums. The hierarchical thing is probly what you want though.


To answer the question about HMMs, all you need to do is change the emission density function—the forward algorithm doesn’t change. There’s no restriction against it being multivariate. But it’s overkill for this situation, as @bbbales2 points out, because you only have two time points. So you just have the transition probabilities to estimate (either as parameters in a model or as expectations afterward).

Do you have any structure for the latent state? Are there predictors besides the answers to these questions?


Sorry for not responding sooner. I was caught up in a few other time sensitive projects.

Thanks for the replies. I agree with your concern about the loss of information in this approach. For what it is worth, I recommended a network-based analysis approach (very popular in mental health these days). They are pursuing that idea as a separate analysis path. However, the primary research question actually deals with clustering. My friend theorizes that the condition she studies has subtypes (I have been calling these states) and that these subtypes can transition into different subtypes over time or with treatment. I believe she is really interested in determining whether certain subtypes are more or less likely to recover. So a simpler way of looking at it would be to determine the state/cluster at Time 1 and then relate it to whether participants are below some clinical threshold at Time 2 – although her framing implied she was keen to know whether subtypes changed over time.

Regarding Bob’s question, there is no structure for the latent state (if I understand your meaning). Her hope is to identify states in a purely data-driven manner. She actually has more than two time points and also has several predictors (I have not seen those yet) – but a wise man on this forum once said that you should start with a simple model and build out. So for now, I figured we would just look at fitting two time points with no predictors.

That said, now that you have a broader view of the goals – if you still think an HMM is overkill, I will pass that along! And again, I really appreciate the advice.


The question’s then how to test this theory. It’s very hard to compare clustering models with different numbers of clusters or clusters that grow out of different initializations. It’s very easy to look at clusters and assign some meaning to them, but it’s much harder to then validate any of that. Here’s some related work trying to address this question for mixed-membership multinomial clustering (aka LDA):


Yes, exactly this. I’ve had a lot of trouble making sense out of clustering models, even if the outputs seem interesting (why this N? why not another?). The truth is clusters are not easy. So maybe pass along that advice, cause it’s easy to mistakenly assume they are!


That reminds me that Andrew also weighed in on the blog: