Tips for efficiency using a multivariate normal on a hidden markov model

I would like to fit the following model using a multivariate normal, it is a Hidden Markov Model where we know the latent states. We have the following observations

x_{1,s(1)}, \ldots, x_{n,s(n)}

where x_{i,s(i)} corresponds to observation i assigned to latent state s(i) \in \{1, \ldots,K\}.

We will model the x_{i,s(i)} as a multivariate normal centered on zero defined by the following (I don’t write out the entire expression so as to not detract from the main issue):

\langle x_{i,s(i)} x_{j,s(j)} \rangle = \begin{cases} f(\sigma_{s(i)}) \text{ if } i=j \text{ and } \\ g(\sigma_{s(i)}) \text{ if } |i-j|=1 \text{ and } s(i)=s(j) \\ 0 \text{ otherwise} \end{cases}

and

\sigma_{s(i)} \sim Gamma(3,0.2)

and f and g are functions of \sigma_{s(i)} which I don’t write down on purpose. In other words, consecutive measurements of the same state are correlated. We can also express this as

x \sim \text{MultiNormal}(0,\sum)

where \sum_{i,j}=\langle x_{i,s(i)} x_{j,s(j)} \rangle

I have been trying to implement this but it’s extremely inefficient and I would like to see if it could be improved. Here is what my current model looks like simplified (note that the functions f and g are placeholders for a more complicated expression in the stan model):


data {
  int<lower=0> N;
  int<lower=0> K;
  int<lower=1,upper=K> state[N];
  vector[N] x;
}

parameters {
  real sigma[K];
}

transformed parameters {
  cov_matrix[N] cov;

  for (i in 1:N) {
    for (j in 1:N) {
      if (i==j) {
        cov[i,j] = f(sigma[state[i]]);
      } else if (abs(i-j)==1 && state[i]==state[j] ) {
        cov[i,j] = g(sigma[state[i]]);
      } else {
        cov[i,j] = 0;
      }
    }
  }

}

model {
  sigma ~ gamma(3,0.2);
  x ~ multi_normal(rep_vector(0,N),cov);
}

Any tips appreciated!

We have some built-in functions for HMMs, if that helps: 29.1 Stan functions | Stan Functions Reference

1 Like

If the latent states are known it’s a regular Markov Model, not a Hidden one.

Anyway, if N is large the slowest part is Cholesky decomposition. That scales as O\left(N^3\right).
The covariance matrix is block diagonal so you could speed up a bit by breaking it into blocks:

...
transformed data {
  int n_blocks = 1;
  for (k in 2:N) {
    if (state[k] != state[k-1]) {
      n_blocks += 1;
    }
  }
  int block_sizes[n_blocks] = rep_vector(1, n_blocks);
  { int b = 1;
    for (k in 2:N) {
      if (state[k] == state[k-1]) {
        block_sizes[b] += 1;
      } else {
        b += 1;
      }
   } }
}
...
model {
  sigma ~ gamma(3, 0.2);

  int k = 1;
  for (b in 1:n_blocks) {
    int s = block_sizes[b];
    real f_s = f(sigma[state[k]]);
    real g_s = g(sigma[state[k]]);
    vector[s] xb = x[k : k + s-1];
    k += s;
    matrix[s,s] cov;
    for (i in 1:s) {
      for (j in 1:s) {
        if (i == j) {
          cov[i,j] = f_s;
        } else if (abs(i-j) == 1) {
          cov[i,j] = g_s;
        } else {
          cov[i,j] = 0.0;
        }
      }
    }
    xb ~ multi_normal(rep_vector(0.0, s), cov);
  }
}

But that’s not all. The blocks are tridiagonal Toeplitz matrices so you can probably find an effcient Cholesky decomposition and use multi_normal_cholesky.

4 Likes

Thank you, I will look into this.

Thank you @nhuurre , this is very helpful and provides the path forward. I just need to learn a bit more about Cholesky decomposition!!