I am fitting a minimal two-state HMM with Gaussian emission probabilities. I am using the hmm_marginal() function for the target density and using the hmm_hidden_state_prob() function to obtain smoothed posterior state probabilities. The model works well in many cases and agrees with MLE implementations such as Rs depmixS4 package.
The problem is that for my data, the state probabilities returned by hmm_hidden_state_prob() converge to, and eventually reach, unity (for one of the states). The subsequent state probabilities are all NaN. There are no warnings or divergent transitions. The model runs quickly.
I’m guessing the convergence to unity is caused by floating point error / rounding? Is there any way I can avoid this? The filtered state probabilities behave fine, and closely match the depmixS4 MLE probabilities, so it just seems to be a problem with the implementation of the Viterbi algorithm?
As always, any help is very much appreciated.
I am using cmdstan 2.28.2 with cmdstanr version 0.4.0.
The stan code is below.
data {
int<lower=1> T; // number of observations
int<lower=1> K; // number of states
real y[T]; // observations
}
parameters {
simplex[K] phi; // initial state probabilities
ordered[K] mu; // state means
vector<lower=0>[K] sigma; // state volatilities
array[K] simplex[K] A; // transition probabilities
}
transformed parameters {
matrix[K, T] log_omega; // log conditional likelihoods
matrix[K, K] gamma; // transition matrix
// construct gamma matrix
for (k in 1:K) {
gamma[k, ] = A[k]';
}
// compute log P(y|state)
for (t in 1:T) {
for (k in 1:K) {
log_omega[k, t] = normal_lpdf(y[t] | mu[k], sigma[k]);
}
}
}
model {
// priors
mu ~ normal(0, 5);
sigma ~ exponential(1);
// likelihood
target += hmm_marginal(log_omega, gamma, phi);
}
generated quantities {
matrix[T, K] zstar = hmm_hidden_state_prob(log_omega, gamma, phi)';
}
And here is a plot of the posterior mean smoothed state probability (na.rm=TRUE):
Hi Marty,
Thanks for reporting this issue. If I follow you, pass a certain date, the probability in zstar goes to 1 for one state and 0 for all the other states, correct? Are the nan returned by Stan? Do you get nan for the probability that goes to 1? Can you share your R code for generating the plot? I want to check whether you took posterior means for zstar or what not.
We’re not implementing the Viterbi algorithm, which returns a maximum a postiori estimator. Rather we are drawing the hidden state probability distribution from a conditional distribution, and the end result is drawn from the posterior distribution. This allows you to compute expectation values (and quantiles), rather estimators which maximize a density.
These will only occur if the variables that HMC is sampling misbehave but not for variables in generated quantities.
Are the nan returned by Stan? Do you get nan for the probability that goes to 1?
Yes and yes. I also get NaN for the probability that goes to zero. Here is the code for plots:
data <- list(T = length(y), K = 2, y = y)
f <- "uni_gauss_hmm.stan"
s <- cmdstanr::cmdstan_model(f)
fit <- s$sample(data, iter_warmup = 500,
iter_sampling = 500, chains = 1)
# to plot posterior mean
z <- fit$draws('zstar') %>%
colMeans(na.rm=TRUE) %>%
matrix(ncol=data$K, byrow=FALSE)
plot(z[, 1], type='l') # state 1 prob
plot(z[, 2], type='l') # state 2 prob
# to plot single draw of posterior
z <- fit$draws('zstar')
dim(z) # (500, 1, 2088)
mean(is.na(z)) # 0.3390805
i = 1
plot(z[i, 1, 1:data$T], type = 'l') # state 1
We’re not implementing the Viterbi algorithm, which returns a maximum a postiori estimator. Rather we are drawing the hidden state probability distribution from a conditional distribution, and the end result is drawn from the posterior distribution.
OK, thank you for the clarification! So how does this differ from the forward probability (filtered probability) posterior estimate?
These will only occur if the variables that HMC is sampling misbehave but not for variables in generated quantities.
Ah I see, thanks again!
Also, I’m happy to share the data if you would like to try to reproduce, so just let me know.
NaNs may be produced due to numerical instability when calculating hmm_hidden_state_prob.
In the post below, I had a similar chat with you. My guess is that the function hmm_hidden_state_prob may need to employ log_sum_exp to bypass the numerical instability issue. I also encountered NaNs when using hmm_hidden_state_prob and the problem was gone when custom filtered probabilities was used, which employed log_sum_exp.
Thanks for this! I completely forgot you raised this issue in our earlier dialogue; at the time I read it as only being an issue for modifications of hmm_hidden_state_prob(). I have actually been using your filtered probability function and can confirm it does not suffer from the NaN issue.
OK, Ignore this question—I just took a look at the forward-backward algorithm. A custom function using Luis Damiano’s and @mshin’s code seems to be doing the job for the smoothed estimates, but of course fixing hmm_hidden_state_prob() would be ideal.
Cheers,
Tom
Hi all,
Thanks for looking into this and thanks @mshin for the readable code. Opening up the underlying stan-math C++ code, I find that indeed we did not use log sum exp in our implementation, nor did we introduce a log alpha matrix. Rather we directly compute alpha, without putting most terms on the log scale, which could explain the numerical instability.
I think it should be relatively straightforward to rewrite the code following @mshin’s example. We’ll want a simple unit test which would produce NaNs with the current implementation but not with the new one.
Question: do we run into issues computing log_alpha when alpha = 0? Or do we just end up calculating something close to 0?
The HMM implementation in Stan doesn’t log_exp_sum but rather a continuous renormalization to avoid numerical problems. Initially I experimented with both initial approaches and found that the continuous renormalization was faster and just as stable.
Because of this numerical issues are likely to be more subtle, so finding a minimal failing example would be helpful.
Agreed. The first step is to create some unit tests. My availability this month is limited but I should be able to work on this come the second half of April. I’ve also marked this as a good first issue for someone who might be interested in contributing to Stan’s C++. I can provide guidance and if no one expresses interest, I’ll take of it.
@Marty , if you have access to the data that can reproduce this hmm_hidden_state_prob() NaN issue, and you are able to release the data to the public so it can be used in test cases, can you please share it?
I have just uploaded the data that reproduces the issue with the stan code + MWE in R. (The data is just monthly arithmetic return on the consumer price index (CPI) from 1947-01-31 to 2021-12-31, which is available here using their free API or otherwise.)
Thanks for sharing the details. I’ll continue this conversation on the GitHub issue, feel free to post there. I created a very simple unit test which triggers the problem, and I’ll try and work out why the current normalization strategy fails.
@charlesm93, just to note (and I haven’t figured out if this is actually a problem in the current implementation or not), but some HMMs will have states that are known with certainty, and the probabilities are literal ones or zeros. For example, suppose I have a zero-inflated Poisson model, and I model the zero-inflation state using a HMM. Then when the observed count is nonzero, I know with certainty which HMM state I am in. It would be good to make sure to handle these literal boundary cases.
is that called a semi-hidden markov model? I have use cases for HMMs where this is the case. The hidden state can be “seen” through some direct measurement such as a survey or a conversion event. For eg, if I want to measure how an ad campaign influences conversions. I may start out with measurements on general site activity (new cookies, devices seen on site) and sales. As the campaign happens I take surveys of peoples attitudes and intent-to-purchase (knowing the number of exposures per respondent). These survey’s can be thought of as measures of the hidden state. Final sales can also be thought of as a measurement of the end state - if someone converted. What is the probability of moving someone from not knowing the brand to knowing, not considering it to considering, actively searching, and finally conversion.
Anyway, interested in how to account for when these states are sometimes revealed.
I don’t think the problem is that the probability converges to 1. I created a unit test to check this and the code runs fine. On further inspection, the problem seems to be that log_omega takes large values in magnitude. See the comment on the GitHub issue.
(a hidden semi-Markov model is something quite different)
I tend to think of it in terms of supervised, unsupervised, and semi-supervised implementations of HMMs and I think this is how it’s referred to in the Stan manual as well. 2.6 Hidden Markov models | Stan User’s Guide
I have some vague recall that those semi-markov models deal with differing dwell times in the states or something. Yes, I forgot that the manual has all three cases described.