Hidden Markov Model with constraints

Hi there!

Assume I’ve got a plain, simple HMM with discrete observations (i.e. multinomial output), very similar to the model described in the Stan manual. I know that some of the observations belong to some predetermined states.

Find attached a toy example with K = 4 states and L = possible outputs: categories 1, 3, 5, 7 and 9 may only be produced by latent states 1 and 3, while categories 2, 4, 6 and 8 may only be emitted by latent states 2 and 4. The emission vectors look like this:

         [,1] [,2] [,3] [,4] [,5] [,6] [,7]  [,8] [,9]
    [1,] 0.00 0.19 0.00 0.07 0.00 0.47 0.000 0.27 0.00
    [2,] 0.25 0.00 0.28 0.00 0.12 0.00 0.042 0.00 0.31
    [3,] 0.00 0.36 0.00 0.19 0.00 0.17 0.000 0.28 0.00
    [4,] 0.21 0.00 0.28 0.00 0.18 0.00 0.086 0.00 0.25

What would be the best way to adapt the code for the forward filter and comply with this extra constrain? I’ve been considering adding a few if to line #33 (see attached stan file) and make the loglikelihood increment only if the combinations of current state (j) and output (x[t]) are valid.

This feels like a hack though and thought that you may be able to give better guidance.


hmm.R (1.2 KB)
hmm.stan (1.4 KB)

Wow when I first looked at this I thought it would be simple, but you’re right, this gets pretty complicated if we want to avoid having -infs sitting around in our log probabilities.

The simplest thing I can think of (and this is not very simple or elegant) is to make your accumulator variable length and not include things that are -inf (but still put -infs in the right places in unalpha_tk).

for (t in 2:T) {
  for (j in 1:K) { // j = current (t)
     if(is_compatible(j, x[t])) { // You'll have to write this function
       int q = 1;
       for (i in 1:K) { // i = previous (t-1)
         // Murphy (2012) Eq. 17.48
         // belief state      + transition prob + local evidence at t
         if(is_inf(unalpha_tk[t-1, i]) == 0) {
           accumulator[q] = unalpha_tk[t-1, i] + log(A_ij[i, j]) + log(phi_k[j, x[t]]);
           q = q + 1;
       unalpha_tk[t, j] = log_sum_exp(accumulator[1:q]);
     } else {
       unalpha_tk[t, j] = log(0); // Or any other way to make it -inf

I think it’s okay to do this. We’re using the -infs as bookkeeping here (edit: and it will be a fixed pattern as a function of the data, not the parameters).

That make sense? If someone else has another suggestion, please do. I’m with Luis on this one. This seems a little awkward. Maybe there’s another way to code up an HMM entirely for sparse looking things.

1 Like

Hey Ben! Thanks a lot for giving this a try. I’m very happy I’m not the only one that finds this tricky! This example is just an illustration from a complex model with more constraints and I’m having a terrible time at tackling this.

I’m with you that the ifs and infs make it feel hackish but it’s fine as long as the math is fine. I’m sure there must be some elegant/more mathy way to simplify this as you mention. I’ll toy with your code and see where this gets me.

More suggestions are welcome!

Two more ideas!

How about making a custom log_sum_exp that ignores infs:

real log_sum_exp_ignore_inf(vector v) {
  int N = size(v);
  vector noninfs[size(v)];
  int q = 1;

  for (n in 1:N) {
    if (is_inf(v[n]) == 0) {
      noninfs[q] = v[n];
      q =  q + 1;

  return log_sum_exp(noninfs[1:q]);

Or make two arrays that encode the possible states at each time sparsely (N time points, 4 states):

int number_of_states_at_T[N] = {

// -1 is a non-state
int state_ids_at_time_T[N, 4] = {
  {1, -1, -1, -1}, //Only one possible state
  {2, 3, -1, -1}, //Only two possible states
  {1, 2, 3, 4},
  {1, 2, 3, 4},

Then your two inner loops that loop over K now would instead loop over number_of_states_at_T[t] (outer) and number_of_states_at_T[t - 1] (inner). You’d have to account for a variable length log_sum_exp too.

All you need to do in principle is forbid any transitions that don’t go through your declared states. Then just normalize what’s left. It’s a pretty simple tweak to the forward algorithm.

The practice isn’t so easy, especially in Stan if the known states are ad hoc.

Bob, just to disambiguate: what do you mean exactly by “forbid”? If a given
transition from i to j is not allowed by the model, then is it just that
the log forward probability unalpha doesn’t get incremented (ie unalpha t =
unalpha t-1)?

I didn’t read closely enough and thought you had some state observations (as is common in many HMM problems in natural language processing, like part-of-speech tagging). The issue’s always the same with these HMMs, though—controlling which transitions or emissions are legal.

To do this efficiently, you’ll need an efficient version of the forward algorithm that prunes the illegal states. You can just hard code that for a problem, or use a separate “legal emission” or “legal transition” boolean matrix to figure out which states to skip. Then everything should just normalize based on the legal states.

1 Like