# Manual ordering of two simplexes for emission probabilities in multinomial HMM

Hi,

I am trying to estimate multinomial hidden markov model (i.e., states and observations are all discrete) using this code by Luis Damiano.

The HMM problem that I am dealing with has 2 hidden states and 4 possible outcomes (observations).

When fitting the model using Stan, the posterior is multimodal as expected, likely because of the possible rotation.
One solution that I am thinking about is setting a constraint that the emission probability of observation 1 for State 1 is always higher than State 2.

In mathematical form, if we define B_{ij} as probability of observing j, j = 1,...4 for state i, i = 1,2, set a constraint that B_{11} > B_{21}.

However, I cannot use simple solutions like `ordered`, since B is implemented as rows of simplexes like below:

``````parameters {
// ...
// Discrete observation model
simplex[L] phi_k[K];              // event probabilities
}
``````

Instead, my alternative plan is manually exchanging two simplexes `phi_k` and `phi_k` when `phi_k[1,1] < phi_k[2,1]`.

One problem with this approach is that I cannot directly access to `phi_k`. Alternatively, I created `ordered_phi_k` in `transformed parameters` and used it in the place of `phi_k`.

So, the original code was

``````transformed parameters {
vector[K] unalpha_tk[T];

{ // Forward algorithm log p(z_t = j | x_{1:t})
real accumulator[K];

for (j in 1:K) {
unalpha_tk[j] = log(p_1k[j]) + log(phi_k[j, x]);
}

for (t in 2:T) {
for (j in 1:K) { // j = current (t)
for (i in 1:K) { // i = previous (t-1)
// Murphy (2012) Eq. 17.48
// belief state      + transition prob + local evidence at t
accumulator[i] = unalpha_tk[t-1, i] + log(A_ij[i, j]) + log(phi_k[j, x[t]]);
}
unalpha_tk[t, j] = log_sum_exp(accumulator);
}
}
} // Forward
}
``````

and I have modified it into

``````transformed parameters {
vector[K] unalpha_tk[T];
vector[L] ordered_phi_k[K];

{ // Forward algorithm log p(z_t = j | x_{1:t})

real accumulator[K];

// use ordered phi
if (phi_k[2,1] > phi_k[1,1]) {
ordered_phi_k = phi_k;
ordered_phi_k = phi_k;
} else {
ordered_phi_k = phi_k;
ordered_phi_k = phi_k;
}

for (j in 1:K) {
unalpha_tk[j] = log(p_1k[j]) + log(ordered_phi_k[j, x]);
}

for (t in 2:T) {
for (j in 1:K) { // j = current (t)
for (i in 1:K) { // i = previous (t-1)
// Murphy (2012) Eq. 17.48
// belief state      + transition prob + local evidence at t
accumulator[i] = unalpha_tk[t-1, i] + log(A_ij[i, j]) + log(ordered_phi_k[j, x[t]]);
}
unalpha_tk[t, j] = log_sum_exp(accumulator);
}
}
} // Forward
}
``````

What was the result?

I have simulated a HMM sequence with 1000 observations, where A = [0.75, 0.25; 0.35, 0.65], and B = [0.5, 0.3, 0.1, 0.1; 0.1, 0.15, 0.4, 0.35].

The model reasonably recovers the true parameters:

Altogether, my question is,
is this approach justifiable in this specific constraint?

Minho

We actually have built-in HMM code now, which will be a lot faster. See:

I guess we need to update the user’s guide to keep up!

Do you mean label switching? While that’s a problem for convergence monitoring, it’s usually not a problem for downstream inference, which will again marginalize out the labels.

You can do that, but it’d be more efficient to do so in generated quantities just for convergence monitoring. It doesn’t solve the underlying problem that the model’s not identified in its parameters as declared.

2 Likes

Hello, I don’t want to hijack the thread but it seems like a sensible place to comment/ask.

We actually have built-in HMM code now, which will be a lot faster.

The OP’s code allows calculation of the filtered state probabilities (via the `unalpha_tk` parameter). As far as I can tell, `hmm_marginal` doesn’t give you access to this parameter, and `hmm_hidden_state_prob` returns the the smoothed state probabilities (right?). So how would one use the built in HMM functions and also obtain the filtered state probabilities?
Thanks.

2 Likes

Hi, Bob.

I already knew this, but was not sure about discrete one since the only case study that was using `hmm_marginal` was for continuous observation. I guess just plugging in log categorical likelihood (`categorical_lpmf`) in the place of log normal likelihood (`normal_lpdf`) for `log_omega` would be enough? I will try this.

That would be very useful!

Yes, you are absolutely right. But I wanted to check whether the model is able to find the true (given) structure of the sequence.

I did this because I was worried that manual ordering in generated quantities would only ensure ordering of the emission probabilities, excluding ordering of hidden state probabilities.

Okay, then maybe I can try setting some vague priors that still kind of constraints the parameters so that ordering can be partially achieved.

Minho

Hi, @Marty.

I encountered similar issue and I think I kind of figured out the answer.
Basically, you can re-calculate filtered state probabilities using `log_omega`, `Gamma`, `rho`, which are required for `hmm_marginal` function.

One thing to note here is that if you try to modify the `hmm_gidden_state_prob` function, the results produce NaNs at times, possibly because of under/overflow.

I needed to use some parts of the code that I originally used, to avoid NaN issue.

The implemented function is as below:

``````functions {

// ref: https://github.com/luisdamiano/gsoc17-hhmm/blob/master/hmm/stan/hmm-multinom.stan
matrix hmm_forward_prob(matrix log_omega, matrix Gamma, vector rho) {

int n_state = rows(log_omega);
int n_obs = cols(log_omega);
matrix[n_state, n_obs] unnorm_log_alpha;
matrix[n_state, n_obs] alpha; // p(x_t=j|y_{1:t})
array[n_state] real accumulator;

// first alpha
unnorm_log_alpha[,1] = log(rho) + log_omega[,1];

// other alphas
for (t in 2:n_obs) {
for (j in 1:n_state) {
for (i in 1:n_state) {
accumulator[i] = unnorm_log_alpha[i, t-1] + log(Gamma[i, j]) + log_omega[j, t];
}
unnorm_log_alpha[j, t] = log_sum_exp(accumulator);
}
}

// normalize alpha later for numerical stability
for (t in 1:n_obs) {
alpha[,t] = softmax(unnorm_log_alpha[,t]);
}
return alpha;
} // !hmm_forward_prob
}

``````

Also , I attached the sample code, for someone who may be interested in.

hmm-multinom-example.stan (2.1 KB)

Hi @mshin,

I appreciate the response! That is a nice way to set up the forward algorithm to take the same inputs as the built-in functions.

I guess my question/concern is that in calculating the marginal hidden state probabilities, we have everything we need to increment the target as well as obtain the filtered state probabilities. The fact that `hmm_marginal` only returns the former, means that we have to run the exact same loop again. So if the point of using the built-in functions such as `hmm_marginal` is speed, it seems counter-productive.

If `hmm_marginal` simply returned the entire array of log state probabilities, instead of just the last element, there would be no need for this double-up.

Does that make sense?

Hi @Marty,

I am not an expert on this, but returning the entire array of log state probabilities by default may not be wise considering memory?

Calculating things again in the `generated quantities` section is actually quite fast, since the calculation is only done once per iteration.

Also, I have compared the two approaches (using the github code vs. `hmm_marginal`), and `hmm_marginal` was much faster (around 2~3x faster) in my case.

1 Like

Yep, that makes sense! Handy to know the `hmm_marginal` implementation was that much faster, thanks for sharing.