Hidden Markov Model with constraints

LuisDamiano · August 21, 2017, 2:13am

Hi there!

Assume I’ve got a plain, simple HMM with discrete observations (i.e. multinomial output), very similar to the model described in the Stan manual. I know that some of the observations belong to some predetermined states.

Find attached a toy example with K = 4 states and L = possible outputs: categories 1, 3, 5, 7 and 9 may only be produced by latent states 1 and 3, while categories 2, 4, 6 and 8 may only be emitted by latent states 2 and 4. The emission vectors look like this:

         [,1] [,2] [,3] [,4] [,5] [,6] [,7]  [,8] [,9]
    [1,] 0.00 0.19 0.00 0.07 0.00 0.47 0.000 0.27 0.00
    [2,] 0.25 0.00 0.28 0.00 0.12 0.00 0.042 0.00 0.31
    [3,] 0.00 0.36 0.00 0.19 0.00 0.17 0.000 0.28 0.00
    [4,] 0.21 0.00 0.28 0.00 0.18 0.00 0.086 0.00 0.25

What would be the best way to adapt the code for the forward filter and comply with this extra constrain? I’ve been considering adding a few if to line #33 (see attached stan file) and make the loglikelihood increment only if the combinations of current state (j) and output (x[t]) are valid.

This feels like a hack though and thought that you may be able to give better guidance.

Thanks!

hmm.R (1.2 KB)
hmm.stan (1.4 KB)

bbbales2 · August 21, 2017, 7:15pm

Wow when I first looked at this I thought it would be simple, but you’re right, this gets pretty complicated if we want to avoid having -infs sitting around in our log probabilities.

The simplest thing I can think of (and this is not very simple or elegant) is to make your accumulator variable length and not include things that are -inf (but still put -infs in the right places in unalpha_tk).

for (t in 2:T) {
  for (j in 1:K) { // j = current (t)
     if(is_compatible(j, x[t])) { // You'll have to write this function
       int q = 1;
       for (i in 1:K) { // i = previous (t-1)
         // Murphy (2012) Eq. 17.48
         // belief state      + transition prob + local evidence at t
         if(is_inf(unalpha_tk[t-1, i]) == 0) {
           accumulator[q] = unalpha_tk[t-1, i] + log(A_ij[i, j]) + log(phi_k[j, x[t]]);
           q = q + 1;
         }
       }
       unalpha_tk[t, j] = log_sum_exp(accumulator[1:q]);
     } else {
       unalpha_tk[t, j] = log(0); // Or any other way to make it -inf
     }
  }
}

I think it’s okay to do this. We’re using the -infs as bookkeeping here (edit: and it will be a fixed pattern as a function of the data, not the parameters).

That make sense? If someone else has another suggestion, please do. I’m with Luis on this one. This seems a little awkward. Maybe there’s another way to code up an HMM entirely for sparse looking things.

LuisDamiano · August 22, 2017, 2:50am

Hey Ben! Thanks a lot for giving this a try. I’m very happy I’m not the only one that finds this tricky! This example is just an illustration from a complex model with more constraints and I’m having a terrible time at tackling this.

I’m with you that the ifs and infs make it feel hackish but it’s fine as long as the math is fine. I’m sure there must be some elegant/more mathy way to simplify this as you mention. I’ll toy with your code and see where this gets me.

More suggestions are welcome!

bbbales2 · August 22, 2017, 2:25pm

Two more ideas!

How about making a custom log_sum_exp that ignores infs:

real log_sum_exp_ignore_inf(vector v) {
  int N = size(v);
  vector noninfs[size(v)];
  int q = 1;

  for (n in 1:N) {
    if (is_inf(v[n]) == 0) {
      noninfs[q] = v[n];
      q =  q + 1;
    }
  }

  return log_sum_exp(noninfs[1:q]);
}

Or make two arrays that encode the possible states at each time sparsely (N time points, 4 states):

int number_of_states_at_T[N] = {
  1,
  2,
  4,
  4,
  ...
}

// -1 is a non-state
int state_ids_at_time_T[N, 4] = {
  {1, -1, -1, -1}, //Only one possible state
  {2, 3, -1, -1}, //Only two possible states
  {1, 2, 3, 4},
  {1, 2, 3, 4},
  ...
}

Then your two inner loops that loop over K now would instead loop over number_of_states_at_T[t] (outer) and number_of_states_at_T[t - 1] (inner). You’d have to account for a variable length log_sum_exp too.

Bob_Carpenter · August 23, 2017, 11:52pm

All you need to do in principle is forbid any transitions that don’t go through your declared states. Then just normalize what’s left. It’s a pretty simple tweak to the forward algorithm.

The practice isn’t so easy, especially in Stan if the known states are ad hoc.

LuisDamiano · August 24, 2017, 12:15am

Bob, just to disambiguate: what do you mean exactly by “forbid”? If a given
transition from i to j is not allowed by the model, then is it just that
the log forward probability unalpha doesn’t get incremented (ie unalpha t =
unalpha t-1)?

Bob_Carpenter · August 24, 2017, 8:56am

I didn’t read closely enough and thought you had some state observations (as is common in many HMM problems in natural language processing, like part-of-speech tagging). The issue’s always the same with these HMMs, though—controlling which transitions or emissions are legal.

To do this efficiently, you’ll need an efficient version of the forward algorithm that prunes the illegal states. You can just hard code that for a problem, or use a separate “legal emission” or “legal transition” boolean matrix to figure out which states to skip. Then everything should just normalize based on the legal states.

Topic		Replies	Views
Identifiability and convergence (Input-Output Hidden Markov Model) Modeling hmm	3	1878	July 23, 2017
Fitting a Hidden Markov model with hierarchical emission parameters Modeling hmm	2	1873	July 28, 2017
How to train a Hidden Markov Model with multiple sequences Modeling	4	1721	September 21, 2020
Getting access to "internal" log probs in HMM code (& how they behave across time) Modeling rstan , techniques , hmm	3	1112	May 17, 2023
Multilevel Hidden Markov Model initial sampling fail Modeling rstan , fitting-issues , hmm	2	601	April 20, 2022

Hidden Markov Model with constraints

Related topics