Fitting a Hidden Markov model with hierarchical emission parameters

I’m currently working on training a Hidden Markov model on a collection of sequence data using unsupervised methods. Essentially, I have a collection of time series samples, each of which I assume is generated by a HMM with four hidden states, with each state having a Poisson emissions (and each emission having a different mean). However, I assume that, while the HMMs generating each sequence all have the same transition matrix, each one is allowed to have different Poisson emission means - so that they all have the same transitional probabilities, but different sequences are allowed to have different means in corresponding states.

I’d also like to be able to place some restrictions on the values taken by the transition matrix (i.e., force some of the transitions to be zero). I.e., I’d like the transition matrix to have form
[a b 0 0]
[0 c d 0]
[0 0 e f ]
[0 0 0 1]

and for the initial state probabilities to be [1 0 0 0] - i.e., the HMM starts in the first state always. I know that Baum-Welch does this if you set the initial value of the transition probability to zero, but so far, I’ve run into “initialization failed” issues when specifying zero probabilities for the initial state, and I’m not sure how to approach ensuring zero-probability transitions where I want them.

At the moment, I’ve adapted the Forward algorithm approach outlined in the Stan documentation to this problem. In its current state, it doesn’t work due to the zero initial probabilities that I’ve specified, but it works when I don’t specify any sort of initial probabilities - although the resulting parameters are not in line with what I’d like - a pretty uniform transition matrix, and approximately equal Poisson means for each sample. I’ve tried providing initial values for the transition matrix, such as
[0.900 0.033 0.033 0.034]
[0.050 0.800 0.100 0.050]
[0.050 0.050 0.800 0.100]
[0.033 0.033 0.034 0.900]
but this isn’t optimal, because I’d like to enforce that state 1 can only transition to states 1 and 2, state 2 can only transition to states 2 and 3, etc. Is there any way I can do this in Stan?

    int<lower=1> K; // num states
    int<lower=1> Q; // Length of value array
    int<lower=1> S; // Number of sites
    int<lower=0> s[S+1]; // Indices for each site's data
    int<lower=0> u[Q]; // flattened values for all site
parameters {
    simplex[K] theta[K]; // transit probs
    vector<lower=0>[K] lambda[S]; // poisson means for each site
    real<lower=0, upper=10> a[K]; // Gamma parameters
    real<lower=0, upper=10> b[K]; // Gamma parameters
model {
    a ~ uniform(0, 10);
    b ~ uniform(0, 10);
    for (ss in 1:S) {
        for (k in 1:K)
            lambda[ss, k] ~ gamma(a[k], b[k]);
            real acc[K];
            real gamma[s[ss+1]-s[ss], K];
            int v[s[ss+1]-s[ss]] = u[s[ss]+1:s[ss+1]];
            gamma[1, 1] = log(1); // Hoping that this forces the model into the first state
            for (k in 2:K)
                gamma[1, k] = log(0);
            for (t in 2:size(v)) {
                for (k in 1:K) {
                    for (j in 1:K) {
                        if (j == k || j == k + 1) {
                            acc[j] = gamma[t-1, j] + log(theta[j, k]) + poisson_lpmf(v[t] | lambda[ss, k]);
                        else {
                            acc[j] = negative_infinity();
                    gamma[t, k] = log_sum_exp(acc);
            target += log_sum_exp(gamma[size(gamma)]);
1 Like

You’d write that model in the usual way for a hierarchical model.

You can set HMMs up to always start in the same state. There are a lot of ways HMMs get formulated; usually they don’t assume separate initial distributions, but sometimes they do. The chapter on missing data in the manual shows how to build data structures out of data (your zeroes) and parameters, but it’d be more efficient here to just code everything directly (though much less portable, so I’d advise just mixing the transition matrix). Then you only have three degrees of freedom. once you set a, c, e in [0, 1], b = 1 - a, d = 1- c, and f = 1 - e. I’d parameterize it that way if you know that’s the only model you’re going to use rather than trying to play with putting simplexes together.

All in, this’d parameterize your transition matrix as:

parameters {
  real<lower=0, upper=1> a;
  real<lower=0, upper=1> b;
  real<lower=0, upper=1> c;
transformed parameters {
  simplex[4] theta[4]
    = { [a, 1 - a, 0, 0]',
        [0, b, 1 - b, 0]',
        [0, 0, c, 1 -c]',
        [0, 0, 0, 1]' };

You could of course generalize this banded structure in a loop to arbitrary sizes.

We have a set of prior recommendations in Wiki form and we strongly discourage interval-bounded uniform priors (if you do insist on using them, you don’t need the sampling statements).

To force the first state, you just literally write the first transition step differently than the others. Playing around with infinities and negative infinities is very hard on the chain rule for autodiff.

1 Like

That seemed to work - thanks! Here’s the code I implemented for a simpler version of this, where there are 3 hard-encoded states, emissions are Poisson, and the backward algorithm is used to calculate the probability of just a single observation sequence (I found that it was easier to avoid negative infinity log probabilities when working with the backward algorithm).

data {
    int<lower=1> K; // Length of observation list
    int<lower=0> u[K]; // Observation list
    simplex[3] delta; // Initial Probabilities (assumed to be [1, 0, 0])
parameters {
    real<lower=0, upper=1> a;
    real<lower=0, upper=1> b;
    vector<lower=0, upper=10>[3] lambdas;
transformed parameters {
    simplex[3] theta[3]
      = { [a, 1 - a, 0]',
          [0, b, 1 - b]',
          [0, 0, 1]' };
model {
    int nonzeros[3];
    vector[3] f[K];
    vector[3] f_n;
    nonzeros[1] = 2;
    nonzeros[2] = 2;
    nonzeros[3] = 1;
    lambdas ~ uniform(0, 10);
    for (i in 1:3)
        f_n[i] = poisson_lpmf(u[K] | lambdas[i]);
    f[K] = f_n;
    for (i in 1:K-1) {
        int ind = K - i;
        vector[3] f_i;
        for (j in 1:3) {
            real probs[nonzeros[j]];
            int count = 1;
            for (k in 1:3) {
                if (theta[j, k] == 0) {}
                else {
                    probs[count] = theta[j, k] + f[ind + 1, k];
                    count = count + 1;
            f_i[j] = poisson_lpmf(u[ind] | lambdas[j]) + log_sum_exp(probs);
        f[ind] = f_i;
    print (f[1,1]);
    target += log(delta[1]) + f[1, 1];
1 Like