NaNs in posterior as HMM state probability converges to 1.0

Marty · February 16, 2022, 12:22am

Hi all,

I am fitting a minimal two-state HMM with Gaussian emission probabilities. I am using the hmm_marginal() function for the target density and using the hmm_hidden_state_prob() function to obtain smoothed posterior state probabilities. The model works well in many cases and agrees with MLE implementations such as Rs depmixS4 package.

The problem is that for my data, the state probabilities returned by hmm_hidden_state_prob() converge to, and eventually reach, unity (for one of the states). The subsequent state probabilities are all NaN. There are no warnings or divergent transitions. The model runs quickly.

I’m guessing the convergence to unity is caused by floating point error / rounding? Is there any way I can avoid this? The filtered state probabilities behave fine, and closely match the depmixS4 MLE probabilities, so it just seems to be a problem with the implementation of the Viterbi algorithm?

As always, any help is very much appreciated.

I am using cmdstan 2.28.2 with cmdstanr version 0.4.0.

The stan code is below.

data {
  int<lower=1> T; // number of observations
  int<lower=1> K; // number of states
  real y[T]; // observations
}

parameters {
  simplex[K]  phi; // initial state probabilities
  ordered[K] mu; // state means
  vector<lower=0>[K] sigma; // state volatilities
  array[K] simplex[K] A; // transition probabilities
}

transformed parameters {
  matrix[K, T] log_omega; // log conditional likelihoods
  matrix[K, K] gamma; // transition matrix
  // construct gamma matrix
  for (k in 1:K) {
    gamma[k, ] = A[k]';
  }
  // compute log P(y|state)
  for (t in 1:T) {
    for (k in 1:K) {
      log_omega[k, t] = normal_lpdf(y[t] | mu[k], sigma[k]);
    }
  }
}

model {
  // priors
  mu ~ normal(0, 5);
  sigma ~ exponential(1);
  // likelihood
  target += hmm_marginal(log_omega, gamma, phi);
}

generated quantities {
  matrix[T, K] zstar = hmm_hidden_state_prob(log_omega, gamma, phi)';
}

And here is a plot of the posterior mean smoothed state probability (na.rm=TRUE):

charlesm93 · February 17, 2022, 2:58pm

Hi Marty,
Thanks for reporting this issue. If I follow you, pass a certain date, the probability in zstar goes to 1 for one state and 0 for all the other states, correct? Are the nan returned by Stan? Do you get nan for the probability that goes to 1? Can you share your R code for generating the plot? I want to check whether you took posterior means for zstar or what not.

We’re not implementing the Viterbi algorithm, which returns a maximum a postiori estimator. Rather we are drawing the hidden state probability distribution from a conditional distribution, and the end result is drawn from the posterior distribution. This allows you to compute expectation values (and quantiles), rather estimators which maximize a density.

These will only occur if the variables that HMC is sampling misbehave but not for variables in generated quantities.

Marty · February 20, 2022, 9:12pm

Hi,

Thank you very much for the detailed response!

Are the nan returned by Stan? Do you get nan for the probability that goes to 1?

Yes and yes. I also get NaN for the probability that goes to zero. Here is the code for plots:

data <- list(T = length(y), K = 2, y = y)
f <- "uni_gauss_hmm.stan"
s <- cmdstanr::cmdstan_model(f)
fit <- s$sample(data, iter_warmup = 500, 
                  iter_sampling = 500, chains = 1)

# to plot posterior mean 
z <- fit$draws('zstar') %>% 
  colMeans(na.rm=TRUE) %>% 
  matrix(ncol=data$K, byrow=FALSE)

plot(z[, 1], type='l') # state 1 prob
plot(z[, 2], type='l') # state 2 prob

# to plot single draw of posterior
z <- fit$draws('zstar')
dim(z) # (500, 1, 2088)
mean(is.na(z)) # 0.3390805
i = 1
plot(z[i, 1, 1:data$T], type = 'l') # state 1

We’re not implementing the Viterbi algorithm, which returns a maximum a postiori estimator. Rather we are drawing the hidden state probability distribution from a conditional distribution, and the end result is drawn from the posterior distribution.

OK, thank you for the clarification! So how does this differ from the forward probability (filtered probability) posterior estimate?

These will only occur if the variables that HMC is sampling misbehave but not for variables in generated quantities.

Ah I see, thanks again!

Also, I’m happy to share the data if you would like to try to reproduce, so just let me know.

mshin · February 21, 2022, 6:37am

Hi, @Marty.

NaNs may be produced due to numerical instability when calculating hmm_hidden_state_prob.

In the post below, I had a similar chat with you. My guess is that the function hmm_hidden_state_prob may need to employ log_sum_exp to bypass the numerical instability issue. I also encountered NaNs when using hmm_hidden_state_prob and the problem was gone when custom filtered probabilities was used, which employed log_sum_exp.

I hope this helps!
Minho

Marty · February 21, 2022, 9:13pm

Hi Minho,

Thanks for this! I completely forgot you raised this issue in our earlier dialogue; at the time I read it as only being an issue for modifications of hmm_hidden_state_prob(). I have actually been using your filtered probability function and can confirm it does not suffer from the NaN issue.

Thanks again!

Marty · February 23, 2022, 1:15am

OK, Ignore this question—I just took a look at the forward-backward algorithm. A custom function using Luis Damiano’s and @mshin’s code seems to be doing the job for the smoothed estimates, but of course fixing hmm_hidden_state_prob() would be ideal.
Cheers,
Tom

charlesm93 · February 25, 2022, 5:32pm

Hi all,
Thanks for looking into this and thanks @mshin for the readable code. Opening up the underlying stan-math C++ code, I find that indeed we did not use log sum exp in our implementation, nor did we introduce a log alpha matrix. Rather we directly compute alpha, without putting most terms on the log scale, which could explain the numerical instability.

I think it should be relatively straightforward to rewrite the code following @mshin’s example. We’ll want a simple unit test which would produce NaNs with the current implementation but not with the new one.

Question: do we run into issues computing log_alpha when alpha = 0? Or do we just end up calculating something close to 0?

charlesm93 · February 25, 2022, 5:44pm

github.com/stan-dev/math

hmm_hidden_state returns NaN

opened 05:43PM - 25 Feb 22 UTC

charlesm93

## Description Several users have reported that `hmm_hidden_state` returns nan,… in cases where the probability of the hidden state being in one particular state goes to 1. There seems to be some numerical stability issue. Computing the log unnormalized probability first and then using softmax solves the issue in the reported problems. This issue is raised on the Stan forum, notably [here](https://discourse.mc-stan.org/t/nans-in-posterior-as-hmm-state-probability-converges-to-1-0/26372). A proposed coding of the method in Stan is given [here](https://discourse.mc-stan.org/t/manual-ordering-of-two-simplexes-for-emission-probabilities-in-multinomial-hmm/26041/5). ## Example See discussion on [Stan forum](https://discourse.mc-stan.org/t/nans-in-posterior-as-hmm-state-probability-converges-to-1-0/26372). ## Expected Output API doesn't change but function returns 1 or 0, as it does when manually writing the code in Stan with a log_sum_exp. #### Current Version: v4.3.0

betanalpha · March 15, 2022, 9:24pm

The HMM implementation in Stan doesn’t log_exp_sum but rather a continuous renormalization to avoid numerical problems. Initially I experimented with both initial approaches and found that the continuous renormalization was faster and just as stable.

Because of this numerical issues are likely to be more subtle, so finding a minimal failing example would be helpful.

charlesm93 · March 19, 2022, 11:40pm

Thanks for the clarification.

Agreed. The first step is to create some unit tests. My availability this month is limited but I should be able to work on this come the second half of April. I’ve also marked this as a good first issue for someone who might be interested in contributing to Stan’s C++. I can provide guidance and if no one expresses interest, I’ll take of it.

rfc · March 21, 2022, 3:26am

@Marty , if you have access to the data that can reproduce this hmm_hidden_state_prob() NaN issue, and you are able to release the data to the public so it can be used in test cases, can you please share it?

One option for sharing it could be to upload the data as text or in a zip / gzip archive to the stan math issue hmm_hidden_state returns NaN · Issue #2677 · stan-dev/math · GitHub

That’d give a great starting point for someone to construct a minimal test that can reproduce the same failure mode.

Marty · March 21, 2022, 5:09am

Hello,

I have just uploaded the data that reproduces the issue with the stan code + MWE in R. (The data is just monthly arithmetic return on the consumer price index (CPI) from 1947-01-31 to 2021-12-31, which is available here using their free API or otherwise.)

charlesm93 · March 28, 2022, 7:01pm

Thanks for sharing the details. I’ll continue this conversation on the GitHub issue, feel free to post there. I created a very simple unit test which triggers the problem, and I’ll try and work out why the current normalization strategy fails.

jsocolar · March 28, 2022, 7:25pm

@charlesm93, just to note (and I haven’t figured out if this is actually a problem in the current implementation or not), but some HMMs will have states that are known with certainty, and the probabilities are literal ones or zeros. For example, suppose I have a zero-inflated Poisson model, and I model the zero-inflation state using a HMM. Then when the observed count is nonzero, I know with certainty which HMM state I am in. It would be good to make sure to handle these literal boundary cases.

spinkney · March 28, 2022, 7:35pm

is that called a semi-hidden markov model? I have use cases for HMMs where this is the case. The hidden state can be “seen” through some direct measurement such as a survey or a conversion event. For eg, if I want to measure how an ad campaign influences conversions. I may start out with measurements on general site activity (new cookies, devices seen on site) and sales. As the campaign happens I take surveys of peoples attitudes and intent-to-purchase (knowing the number of exposures per respondent). These survey’s can be thought of as measures of the hidden state. Final sales can also be thought of as a measurement of the end state - if someone converted. What is the probability of moving someone from not knowing the brand to knowing, not considering it to considering, actively searching, and finally conversion.

Anyway, interested in how to account for when these states are sometimes revealed.

charlesm93 · March 28, 2022, 7:45pm

I don’t think the problem is that the probability converges to 1. I created a unit test to check this and the code runs fine. On further inspection, the problem seems to be that log_omega takes large values in magnitude. See the comment on the GitHub issue.

vianeylb · March 28, 2022, 7:45pm

(a hidden semi-Markov model is something quite different)

I tend to think of it in terms of supervised, unsupervised, and semi-supervised implementations of HMMs and I think this is how it’s referred to in the Stan manual as well. 2.6 Hidden Markov models | Stan User’s Guide

spinkney · March 28, 2022, 8:03pm

I have some vague recall that those semi-markov models deal with differing dwell times in the states or something. Yes, I forgot that the manual has all three cases described.

Topic		Replies	Views
Gaussian HMM does not converge and has large rhat Modeling rstan , fitting-issues	5	1001	August 4, 2022
Fitting a Hidden Markov model with hierarchical emission parameters Modeling hmm	2	1893	July 28, 2017
Eigenvectors of transition matrix is NaN Modeling	5	213	March 28, 2024
Posterior predictive distribution for HMMs Modeling hmm , posterior-predictive	11	1868	July 15, 2020
Hidden Markov Model: Trouble With Forward Algorithm Modeling hmm	2	82	January 16, 2025

NaNs in posterior as HMM state probability converges to 1.0

Related topics