Gaussian HMM does not converge and has large rhat

I’m trying yo create an hidden Markov Model with gaussian emission and fit with MCMC.

The observation sequence is generated by following:

p_transition <- matrix(data = c(0.2, 0.4, 0.1, 0.6, 0.2, 0.5, 0.2, 0.4, 0.4), nrow = 3)
p_init <- c(0.24561404, 0.40350877, 0.35087719)

Markov_sequence_length <- 1000

init_state <- which(c(rmultinom(1, 1, init_prob)) == 1)

Markov_sequence <- c(init_state)

for(i in 1:Markov_sequence_length){
  
  last_state <- tail(Markov_sequence, 1)
  
  state_sample <- which(c(rmultinom(1, 1,p_transition[last_state,])) == 1)
  
  
  Markov_sequence <- append(Markov_sequence, state_sample)
}

mus <- c(0.5, 0, -2)
sigmas <- c(1, 0.5, 3)

obs <- c()

for(m in 1:Markov_sequence_length){
  
  new_val <- rnorm(1, mean = mus[Markov_sequence[m]], sd = sigmas[Markov_sequence[m]])
  obs <- append(obs, new_val)
  
}

The model in Stan is defined as following:

data {
  int<lower = 0> N;
  int<lower = 1> Z;
  real y[N];
}

parameters {
  
  real mu[Z];
  real<lower = 0> mu_tau[Z];
  
  real<lower=0> sigma[Z];
  
  simplex[Z] p_transition[Z];
  simplex[Z] p_init; 
}

model {
  
  // priors
  mu_tau ~ scaled_inv_chi_square(1, 0.05);
  mu ~ normal(0, mu_tau);
  
  sigma ~ scaled_inv_chi_square(1, 0.05);
  
  // forward algprithm
  {
    real acc[Z];
    real alpha[N, Z];
    
    for (z in 1:Z){
      
      alpha[1, z] = normal_lpdf(y[1] | mu[z], sigma[z]) + log(p_init[z]);
      
    }
    
    for (t in 2:N){
      
      for (z in 1:Z){       // current state
        
        for (j in 1:Z){     // previous state
          
          acc[j] = alpha[t - 1, j] + log(p_transition[j, z]) + normal_lpdf(y[t] | mu[z], sigma[z]);
          
        }
        
        alpha[t, z] = log_sum_exp(acc);
        
      }
    }
    
    target += log_sum_exp(alpha[N]);
  }
}

Then, the model is fitted in R:

stan_data <- list(N = length(obs),
                  Z = 3,
                  y = obs)

fit <- stan("Wired_HMM.stan",
            data = stan_data,
            iter = 10000,
            control = list(adapt_delta = 0.9),
            chains = 4)

It gives the results:

                     mean se_mean    sd     2.5%    97.5% n_eff Rhat
mu[1]                -0.37    0.58  0.85    -2.09     0.57     2       4.02
mu[2]                -0.75    0.73  1.07    -2.26     0.64     2       3.85
mu[3]                -0.36    0.57  0.82    -2.05     0.57     2       5.26

Which 95% quantile includes my predefined mu(0.5, 0, -2) but with a large range and the rhats are far from 1. Is there anywhere I messed up the HMM? How can I fix the model?

As n_eff is estimated to be 2, it hints the posterior has two modes and the chains are ending up in different modes. HMMs are likely to have mutiple modes. Check the traceplots; see, e.g. bayesplot mcmc_trace.

Thanks! By the way, does my forward algorithm looks correct?

I’ve not done HMMs myself, so can’t say. I also recommend to check the HMM functions (that were introduced two years ago):

Oops forgot to paste the link HMM Example

Thanks