Manual ordering of two simplexes for emission probabilities in multinomial HMM

Hi,

I am trying to estimate multinomial hidden markov model (i.e., states and observations are all discrete) using this code by Luis Damiano.

The HMM problem that I am dealing with has 2 hidden states and 4 possible outcomes (observations).

When fitting the model using Stan, the posterior is multimodal as expected, likely because of the possible rotation.
One solution that I am thinking about is setting a constraint that the emission probability of observation 1 for State 1 is always higher than State 2.

In mathematical form, if we define B_{ij} as probability of observing j, j = 1,...4 for state i, i = 1,2, set a constraint that B_{11} > B_{21}.

However, I cannot use simple solutions like ordered, since B is implemented as rows of simplexes like below:

parameters {
  // ...
  // Discrete observation model
  simplex[L] phi_k[K];              // event probabilities
}

Instead, my alternative plan is manually exchanging two simplexes phi_k[1] and phi_k[2] when phi_k[1,1] < phi_k[2,1].

One problem with this approach is that I cannot directly access to phi_k. Alternatively, I created ordered_phi_k in transformed parameters and used it in the place of phi_k.

So, the original code was

transformed parameters {
  vector[K] unalpha_tk[T];

  { // Forward algorithm log p(z_t = j | x_{1:t})
    real accumulator[K];

    for (j in 1:K) {
      unalpha_tk[1][j] = log(p_1k[j]) + log(phi_k[j, x[1]]);
    }

    for (t in 2:T) {
      for (j in 1:K) { // j = current (t)
        for (i in 1:K) { // i = previous (t-1)
                         // Murphy (2012) Eq. 17.48
                         // belief state      + transition prob + local evidence at t
          accumulator[i] = unalpha_tk[t-1, i] + log(A_ij[i, j]) + log(phi_k[j, x[t]]);
        }
        unalpha_tk[t, j] = log_sum_exp(accumulator);
      }
    }
  } // Forward
}

and I have modified it into

transformed parameters {
  vector[K] unalpha_tk[T];
  vector[L] ordered_phi_k[K];
  
  { // Forward algorithm log p(z_t = j | x_{1:t})
  
    real accumulator[K];

    // use ordered phi
    if (phi_k[2,1] > phi_k[1,1]) {
      ordered_phi_k[1] = phi_k[2];
      ordered_phi_k[2] = phi_k[1];
    } else {
      ordered_phi_k[1] = phi_k[1];
      ordered_phi_k[2] = phi_k[2];
    }
    
    
    for (j in 1:K) {
      unalpha_tk[1][j] = log(p_1k[j]) + log(ordered_phi_k[j, x[1]]);
    }

    for (t in 2:T) {
      for (j in 1:K) { // j = current (t)
        for (i in 1:K) { // i = previous (t-1)
                         // Murphy (2012) Eq. 17.48
                         // belief state      + transition prob + local evidence at t
          accumulator[i] = unalpha_tk[t-1, i] + log(A_ij[i, j]) + log(ordered_phi_k[j, x[t]]);
        }
        unalpha_tk[t, j] = log_sum_exp(accumulator);
      }
    }
  } // Forward
}

What was the result?

I have simulated a HMM sequence with 1000 observations, where A = [0.75, 0.25; 0.35, 0.65], and B = [0.5, 0.3, 0.1, 0.1; 0.1, 0.15, 0.4, 0.35].

The model reasonably recovers the true parameters:


Altogether, my question is,
is this approach justifiable in this specific constraint?

Thank you in advance,
Minho

We actually have built-in HMM code now, which will be a lot faster. See:

I guess we need to update the user’s guide to keep up!

Do you mean label switching? While that’s a problem for convergence monitoring, it’s usually not a problem for downstream inference, which will again marginalize out the labels.

You can do that, but it’d be more efficient to do so in generated quantities just for convergence monitoring. It doesn’t solve the underlying problem that the model’s not identified in its parameters as declared.

2 Likes

Hello, I don’t want to hijack the thread but it seems like a sensible place to comment/ask.

We actually have built-in HMM code now, which will be a lot faster.

The OP’s code allows calculation of the filtered state probabilities (via the unalpha_tk parameter). As far as I can tell, hmm_marginal doesn’t give you access to this parameter, and hmm_hidden_state_prob returns the the smoothed state probabilities (right?). So how would one use the built in HMM functions and also obtain the filtered state probabilities?
Thanks.

2 Likes

Hi, Bob.

Huge thanks for your reply!

I already knew this, but was not sure about discrete one since the only case study that was using hmm_marginal was for continuous observation. I guess just plugging in log categorical likelihood (categorical_lpmf) in the place of log normal likelihood (normal_lpdf) for log_omega would be enough? I will try this.

That would be very useful!

Yes, you are absolutely right. But I wanted to check whether the model is able to find the true (given) structure of the sequence.

I did this because I was worried that manual ordering in generated quantities would only ensure ordering of the emission probabilities, excluding ordering of hidden state probabilities.

Okay, then maybe I can try setting some vague priors that still kind of constraints the parameters so that ordering can be partially achieved.

Thank you again for your detailed answer!
Minho

Hi, @Marty.

I encountered similar issue and I think I kind of figured out the answer.
Basically, you can re-calculate filtered state probabilities using log_omega, Gamma, rho, which are required for hmm_marginal function.

One thing to note here is that if you try to modify the hmm_gidden_state_prob function, the results produce NaNs at times, possibly because of under/overflow.

I needed to use some parts of the code that I originally used, to avoid NaN issue.

The implemented function is as below:

functions {
  
  // ref: https://github.com/luisdamiano/gsoc17-hhmm/blob/master/hmm/stan/hmm-multinom.stan
  matrix hmm_forward_prob(matrix log_omega, matrix Gamma, vector rho) {
    
    int n_state = rows(log_omega);
    int n_obs = cols(log_omega);
    matrix[n_state, n_obs] unnorm_log_alpha; 
    matrix[n_state, n_obs] alpha; // p(x_t=j|y_{1:t})
    array[n_state] real accumulator;
    
    // first alpha
    unnorm_log_alpha[,1] = log(rho) + log_omega[,1];
    
    // other alphas
    for (t in 2:n_obs) {
      for (j in 1:n_state) {
        for (i in 1:n_state) {
          accumulator[i] = unnorm_log_alpha[i, t-1] + log(Gamma[i, j]) + log_omega[j, t];
        }
        unnorm_log_alpha[j, t] = log_sum_exp(accumulator);
      }
    }

    // normalize alpha later for numerical stability
    for (t in 1:n_obs) {
      alpha[,t] = softmax(unnorm_log_alpha[,t]);
    }
    return alpha;
  } // !hmm_forward_prob
}

Also , I attached the sample code, for someone who may be interested in.

hmm-multinom-example.stan (2.1 KB)

Hi @mshin,

I appreciate the response! That is a nice way to set up the forward algorithm to take the same inputs as the built-in functions.

I guess my question/concern is that in calculating the marginal hidden state probabilities, we have everything we need to increment the target as well as obtain the filtered state probabilities. The fact that hmm_marginal only returns the former, means that we have to run the exact same loop again. So if the point of using the built-in functions such as hmm_marginal is speed, it seems counter-productive.

If hmm_marginal simply returned the entire array of log state probabilities, instead of just the last element, there would be no need for this double-up.

Does that make sense?

Hi @Marty,

I am not an expert on this, but returning the entire array of log state probabilities by default may not be wise considering memory?

Calculating things again in the generated quantities section is actually quite fast, since the calculation is only done once per iteration.

Also, I have compared the two approaches (using the github code vs. hmm_marginal), and hmm_marginal was much faster (around 2~3x faster) in my case.

1 Like

Yep, that makes sense! Handy to know the hmm_marginal implementation was that much faster, thanks for sharing.