# Manual ordering of two simplexes for emission probabilities in multinomial HMM

Hi,

I am trying to estimate multinomial hidden markov model (i.e., states and observations are all discrete) using this code by Luis Damiano.

The HMM problem that I am dealing with has 2 hidden states and 4 possible outcomes (observations).

When fitting the model using Stan, the posterior is multimodal as expected, likely because of the possible rotation.
One solution that I am thinking about is setting a constraint that the emission probability of observation 1 for State 1 is always higher than State 2.

In mathematical form, if we define B_{ij} as probability of observing j, j = 1,...4 for state i, i = 1,2, set a constraint that B_{11} > B_{21}.

However, I cannot use simple solutions like `ordered`, since B is implemented as rows of simplexes like below:

``````parameters {
// ...
// Discrete observation model
simplex[L] phi_k[K];              // event probabilities
}
``````

Instead, my alternative plan is manually exchanging two simplexes `phi_k[1]` and `phi_k[2]` when `phi_k[1,1] < phi_k[2,1]`.

One problem with this approach is that I cannot directly access to `phi_k`. Alternatively, I created `ordered_phi_k` in `transformed parameters` and used it in the place of `phi_k`.

So, the original code was

``````transformed parameters {
vector[K] unalpha_tk[T];

{ // Forward algorithm log p(z_t = j | x_{1:t})
real accumulator[K];

for (j in 1:K) {
unalpha_tk[1][j] = log(p_1k[j]) + log(phi_k[j, x[1]]);
}

for (t in 2:T) {
for (j in 1:K) { // j = current (t)
for (i in 1:K) { // i = previous (t-1)
// Murphy (2012) Eq. 17.48
// belief state      + transition prob + local evidence at t
accumulator[i] = unalpha_tk[t-1, i] + log(A_ij[i, j]) + log(phi_k[j, x[t]]);
}
unalpha_tk[t, j] = log_sum_exp(accumulator);
}
}
} // Forward
}
``````

and I have modified it into

``````transformed parameters {
vector[K] unalpha_tk[T];
vector[L] ordered_phi_k[K];

{ // Forward algorithm log p(z_t = j | x_{1:t})

real accumulator[K];

// use ordered phi
if (phi_k[2,1] > phi_k[1,1]) {
ordered_phi_k[1] = phi_k[2];
ordered_phi_k[2] = phi_k[1];
} else {
ordered_phi_k[1] = phi_k[1];
ordered_phi_k[2] = phi_k[2];
}

for (j in 1:K) {
unalpha_tk[1][j] = log(p_1k[j]) + log(ordered_phi_k[j, x[1]]);
}

for (t in 2:T) {
for (j in 1:K) { // j = current (t)
for (i in 1:K) { // i = previous (t-1)
// Murphy (2012) Eq. 17.48
// belief state      + transition prob + local evidence at t
accumulator[i] = unalpha_tk[t-1, i] + log(A_ij[i, j]) + log(ordered_phi_k[j, x[t]]);
}
unalpha_tk[t, j] = log_sum_exp(accumulator);
}
}
} // Forward
}
``````

What was the result?

I have simulated a HMM sequence with 1000 observations, where A = [0.75, 0.25; 0.35, 0.65], and B = [0.5, 0.3, 0.1, 0.1; 0.1, 0.15, 0.4, 0.35].

The model reasonably recovers the true parameters:

Altogether, my question is,
is this approach justifiable in this specific constraint?

Minho

We actually have built-in HMM code now, which will be a lot faster. See:

I guess we need to update the userâ€™s guide to keep up!

Do you mean label switching? While thatâ€™s a problem for convergence monitoring, itâ€™s usually not a problem for downstream inference, which will again marginalize out the labels.

You can do that, but itâ€™d be more efficient to do so in generated quantities just for convergence monitoring. It doesnâ€™t solve the underlying problem that the modelâ€™s not identified in its parameters as declared.

2 Likes

Hello, I donâ€™t want to hijack the thread but it seems like a sensible place to comment/ask.

We actually have built-in HMM code now, which will be a lot faster.

The OPâ€™s code allows calculation of the filtered state probabilities (via the `unalpha_tk` parameter). As far as I can tell, `hmm_marginal` doesnâ€™t give you access to this parameter, and `hmm_hidden_state_prob` returns the the smoothed state probabilities (right?). So how would one use the built in HMM functions and also obtain the filtered state probabilities?
Thanks.

2 Likes

Hi, Bob.

I already knew this, but was not sure about discrete one since the only case study that was using `hmm_marginal` was for continuous observation. I guess just plugging in log categorical likelihood (`categorical_lpmf`) in the place of log normal likelihood (`normal_lpdf`) for `log_omega` would be enough? I will try this.

That would be very useful!

Yes, you are absolutely right. But I wanted to check whether the model is able to find the true (given) structure of the sequence.

I did this because I was worried that manual ordering in generated quantities would only ensure ordering of the emission probabilities, excluding ordering of hidden state probabilities.

Okay, then maybe I can try setting some vague priors that still kind of constraints the parameters so that ordering can be partially achieved.

Minho

Hi, @Marty.

I encountered similar issue and I think I kind of figured out the answer.
Basically, you can re-calculate filtered state probabilities using `log_omega`, `Gamma`, `rho`, which are required for `hmm_marginal` function.

One thing to note here is that if you try to modify the `hmm_gidden_state_prob` function, the results produce NaNs at times, possibly because of under/overflow.

I needed to use some parts of the code that I originally used, to avoid NaN issue.

The implemented function is as below:

``````functions {

// ref: https://github.com/luisdamiano/gsoc17-hhmm/blob/master/hmm/stan/hmm-multinom.stan
matrix hmm_forward_prob(matrix log_omega, matrix Gamma, vector rho) {

int n_state = rows(log_omega);
int n_obs = cols(log_omega);
matrix[n_state, n_obs] unnorm_log_alpha;
matrix[n_state, n_obs] alpha; // p(x_t=j|y_{1:t})
array[n_state] real accumulator;

// first alpha
unnorm_log_alpha[,1] = log(rho) + log_omega[,1];

// other alphas
for (t in 2:n_obs) {
for (j in 1:n_state) {
for (i in 1:n_state) {
accumulator[i] = unnorm_log_alpha[i, t-1] + log(Gamma[i, j]) + log_omega[j, t];
}
unnorm_log_alpha[j, t] = log_sum_exp(accumulator);
}
}

// normalize alpha later for numerical stability
for (t in 1:n_obs) {
alpha[,t] = softmax(unnorm_log_alpha[,t]);
}
return alpha;
} // !hmm_forward_prob
}

``````

Also , I attached the sample code, for someone who may be interested in.

hmm-multinom-example.stan (2.1 KB)

Hi @mshin,

I appreciate the response! That is a nice way to set up the forward algorithm to take the same inputs as the built-in functions.

I guess my question/concern is that in calculating the marginal hidden state probabilities, we have everything we need to increment the target as well as obtain the filtered state probabilities. The fact that `hmm_marginal` only returns the former, means that we have to run the exact same loop again. So if the point of using the built-in functions such as `hmm_marginal` is speed, it seems counter-productive.

If `hmm_marginal` simply returned the entire array of log state probabilities, instead of just the last element, there would be no need for this double-up.

Does that make sense?

Hi @Marty,

I am not an expert on this, but returning the entire array of log state probabilities by default may not be wise considering memory?

Calculating things again in the `generated quantities` section is actually quite fast, since the calculation is only done once per iteration.

Also, I have compared the two approaches (using the github code vs. `hmm_marginal`), and `hmm_marginal` was much faster (around 2~3x faster) in my case.

1 Like

Yep, that makes sense! Handy to know the `hmm_marginal` implementation was that much faster, thanks for sharing.