Applying covariate to transition probabilities (in a HMM)

HDhanis · February 12, 2025, 7:08pm

Hi everyone,

Here again with another question. For some context, I am implementing a Hidden Markov Model, that in its simplest form is just a order 1 transition probabilities between the states and Gaussian emissions for the different observations that change depending on the current state. Since each of the timeseries comes from different groups I also have covariates associated with these groups that I want to test whether they influence the transition probabilities between the states.

Basically all I want to know is what is the correct way to make the covariates influence the transition probabilities between the states. Up to now I have been having the covariates affect the log-probability doing this:

transformed parameters {
  simplex[M] meta_trans_perPat[P, M]; // Patient-specific transition probabilities

  // Hierarchical patient-specific transition probabilities with covariate effects
  for (p in 1:P) {
    for (m_from in 1:M) {
      vector[M] adjusted_transitions_logit;
      // Matrix-vector multiplication: X[p] is [1, C] and meta_beta_trans[m_from] is [C, M] giving: [1,M]
      adjusted_transitions_logit = log(meta_trans_group[m_from]) + to_vector(X[p] * meta_beta_trans[m_from]);
      
      // Apply softmax to get patient-specific transition probabilities
      meta_trans_perPat[p, m_from] = softmax(adjusted_transitions_logit);
    }
  }
}

However, at some point I was debugging using print and I noticed that different values of meta_beta_trans can actually lead to similar probabilities in meta_trans_perPat[p, m_from] if they shift adjusted_transitions_logit in the same way (see the screenshot for the toy example). I understand that this is because softmax works based on proportions but doesn’t that make it almost impossible for the model to estimate the values of meta_beta_trans since there are different values that produce the same results? Should I then be applying the covariates in a different way?

Thanks again for all the help and comments, it was great to talk last time!

jsocolar · February 12, 2025, 10:12pm

In softmax regression (i.e. multinomial logistic regression) it is typical to pin one of the arguments to the softmax function to zero, removing the non-identifiability. One way to think about this is that the softmax doesn’t have a unique inverse, so we need a way to pick which particular inverse-softmax we are working with. Pinning an element to zero identifies the unique inverse-softmax.

HDhanis · February 13, 2025, 12:27pm

Oh alright, that makes sense. So just to make sure I understood correctly, basically I should pick a reference element and center the entire input vector to that element (i.e. subtract the opposite of that element to the entire input vector), so that that element is zero and all others are shifted by the reference amount. Something like:


  int m_ref = 2; // reference element to center adjusted effects

  // Hierarchical patient-specific transition probabilities with covariate effects
  for (p in 1:P) {
    for (m_from in 1:M) {
      vector[M] adjusted_transitions_logit;
      // Matrix-vector multiplication: X[p] is [1, C] and meta_beta_trans[m_from] is [C, M] giving: [1,M]
      adjusted_transitions_logit = log(meta_trans_group[m_from]) + to_vector(X[p] * meta_beta_trans[m_from]);

      // Reference centering
      adjusted_transitions_logit = adjusted_transitions_logit - adjusted_transitions_logit[m_ref];

      
      // Apply softmax to get patient-specific transition probabilities
      meta_trans_perPat[p, m_from] = softmax(adjusted_transitions_logit);
    }
  }

EDIT: well actually I was just thinking about this and this does not work, because the same problem remains. The translations do enforce the values to be centered to the reference value, but multiple combinations of adjusted_transition_logit (before reference centering) can lead to the same final adjusted_transitions_logit. Were you suggesting that I just make one of the values 0 in adjusted_transitions_logit ? So baiscally every element is estimated against that reference I guess

jsocolar · February 13, 2025, 1:56pm

yes

Bob_Carpenter · February 14, 2025, 8:54pm

Typical in stats but rare in ML. The other way to identify is with a sum-to-zero constraint, which is more symmetric in the priors. But that’s pretty much impossible to do with covariates.

Not quite. There’s nothing to center. You just set the linear predictor to 0 for the input to softmax, and it identifies the beta values.

data {
  ...
  matrix[M, N] x;  // covariates (transposed for N observations)

parameters {
  matrix[K - 1, M] beta;  // regression coefficients

model {
  matrix[K-1, N] beta_x = beta * x;
  for (n in 1:N) {
    simplex[K] theta = softmax(append_row(0, beta_x[, n]));  // fix first linear predictor to 0

[edit: get the indexing right and make the matrix multiply efficient]

Topic		Replies	Views
Covariates in Hidden Markov Model with hmm_marginal Modeling ecology , hmm	19	3103	September 19, 2023
Fitting HMMs with time-varying transition matrices using brms: A prototype Publicity medicine , hmm , covid-19 , brms	4	2617	July 22, 2022
Constraning probabilities in a HMM Modeling rstan , hmm	11	120	January 27, 2025
Markov-Switching Model with time varying transition probabilities in Stan Modeling fitting-issues , performance	4	1595	September 10, 2019
HMM with time-varying probabilities Modeling hmm	3	787	July 12, 2023

Applying covariate to transition probabilities (in a HMM)

Related topics