Hidden Markov Model with Binomial: How to infer a state-dependent intercept when the outcome gets unary

I am employing a Hidden Markov Model to model a binary outcome over time (e.g., on a given day, will an individual check the number of Covid-19 infections or not). The model is basically set up like proposed in this great case study. Now I ran this model assuming the existence of 3 latent states, reflecting an individual’s current propensity to check the number of infections.

What’s happening is that the state-dependent intercept of state 3 (mu[3]; the state of highest checking propensity) in the Binomial part of the model goes towards positive infinity. The problem the model is facing is that in state 3, the dependent variable always takes the value 1, i.e., the individual always checks the number of infections when she is in state 3.

Since the model cannot infer the intercept when the outcome is unary, how can I prevent it from estimating a state-dependent intercept that goes towards infinity?

I am showing the model parts that might be relevant to answer this question (I have simplified it a bit by not accounting for the heterogeneity across individuals):

  int<lower = 1> N; // number of observations
  int<lower = 0, upper = 1> y[N]; // binary outcome: infection check "yes" (1) vs. "no" (0)
  int<lower = 1> S; // number of states
parameters {
  simplex[S] theta[S];  
  ordered[S] mu; // ordered state-dependent intercepts
model {

  // priors
  mu[1] ~ normal(-1, 1);
  mu[2] ~ normal(1, 1);
  mu[3] ~ normal(3, 1);

  // forward algorithm
  real acc[S];
  real gamma[N, S];
  for (k in 1:S)
    gamma[1, k] = bernoulli_logit_lpmf(y[1] |  mu[k]);
  for (t in 2:N) {
    for (k in 1:S) {
      for (j in 1:S)
        acc[j] = gamma[t-1, j] + log(theta[j, k]) + bernoulli_logit_lpmf(y[t] |  mu[k]);
      gamma[t, k] = log_sum_exp(acc);
  target += log_sum_exp(gamma[N]);

This plot visualises the issue that for state 3, the observed output is always y = 1 and never y = 0.

And here is the posterior output after running my model on my real data (I have validated it using simulated data before and that worked pretty well):

1 Like

I’m not an expert on hidden markov models, but it looks to me like something has gone awry here. The issue you are dealing with feels a lot like complete separation in logistic regression. In Bayesian analysis, a common solution is to regularize the coefficient (in this case mu[3]) via the prior.

What feels weird to me is that you have a prior on mu[3], but your output still lets mu[3] fly off towards infinity. This shouldn’t happen. By the time mu[3] attains a value of mu[3] = 10, probability of success (i.e. inverse_logit(mu[3]) is within 1e-05 of one. At this point, unless you have millions of data points, there is no way for the likelihood to suggest with any confidence that mu[3] should be higher than it is. The prior, on the other hand, has some very strong opinions about how high mu[3] can get. For example, it sees mu[3] = 20 as something like 1e-50 times less likely than mu[3] = 10.

So there’s no way that your model should place noteworthy posterior probability at high values of mu[3], where “high” depends on your sample size but probably doesn’t exceed 7 or 8. The fact that your model is seeing values as high as infinity (more or less), suggests that you somehow failed to apply the prior to mu[3] when you ran the code. Is it possible, for example, that you originally coded a two-state model and then updated the script for a three-state model without remembering to add the prior on mu[3]? I realize that there’s a prior on mu[3] in the code you provided above…


Thanks Jacob, this is helpful. I will double check if the prior was applied correctly to mu[3] in this estimation.

Do you have a recommendation for the choice of the prior? Would it make more sense to use something like mu[3] ~ normal(0,1), or mu[3] ~ normal(0,5) or mu[3] ~ normal(3,1) in this case - or what else?

I’m afraid I don’t know enough about hidden markov models (or about your research area and your data) to provide a good suggestion for the prior. I will say that by putting a prior of normal(3,1) on mu[3] you are probably ensuring that there will be at least one state (mu[3]) where the probability of success is quite high (on the probability scale, the +/- two-sigma interval of that prior corresponds to [.73, .99]. Whether this is a good idea or not I cannot say.

1 Like

I agree with @jsocolar on most of the prior recommendations. If unsure, you can always do a prior predictive check (e.g. use the prior to simulate new datasets as discussed e.g. in the Visualisation paper or the Bayesian workflow paper.

I would however ask you to consider also other options. The simples one is to follow the data - maybe you can change the meaning of the state 3 to be “checks always” and remove mu3 altogether?. Further, I find it slightly implausible that HMMs are actually a good model for this type of data - why would there be only three separate probabilitites for checking the number of cases? Wouldn’t a smooth model (e.g. a state-space model or a spline/GP model) make more sense?
Alternatively, you could also potentially model the process with only two states (checks always/never checks) and allow the transition probabilities to change over time.

Best of luck with your analysis!

1 Like

Thank you so much, that was very helpful!

The recommendation that a smooth model would make more sense appears to be correct - however, I do not plan to leave the route of the HMM for the moment. What you recommend as the alternative (2 states, time varying transition probabilities) is what I’m using for the moment - thank you!