# Tips for efficiency using a multivariate normal on a hidden markov model

I would like to fit the following model using a multivariate normal, it is a Hidden Markov Model where we know the latent states. We have the following observations

x_{1,s(1)}, \ldots, x_{n,s(n)}

where x_{i,s(i)} corresponds to observation i assigned to latent state s(i) \in \{1, \ldots,K\}.

We will model the x_{i,s(i)} as a multivariate normal centered on zero defined by the following (I don’t write out the entire expression so as to not detract from the main issue):

\langle x_{i,s(i)} x_{j,s(j)} \rangle = \begin{cases} f(\sigma_{s(i)}) \text{ if } i=j \text{ and } \\ g(\sigma_{s(i)}) \text{ if } |i-j|=1 \text{ and } s(i)=s(j) \\ 0 \text{ otherwise} \end{cases}

and

\sigma_{s(i)} \sim Gamma(3,0.2)

and f and g are functions of \sigma_{s(i)} which I don’t write down on purpose. In other words, consecutive measurements of the same state are correlated. We can also express this as

x \sim \text{MultiNormal}(0,\sum)

where \sum_{i,j}=\langle x_{i,s(i)} x_{j,s(j)} \rangle

I have been trying to implement this but it’s extremely inefficient and I would like to see if it could be improved. Here is what my current model looks like simplified (note that the functions f and g are placeholders for a more complicated expression in the stan model):


data {
int<lower=0> N;
int<lower=0> K;
int<lower=1,upper=K> state[N];
vector[N] x;
}

parameters {
real sigma[K];
}

transformed parameters {
cov_matrix[N] cov;

for (i in 1:N) {
for (j in 1:N) {
if (i==j) {
cov[i,j] = f(sigma[state[i]]);
} else if (abs(i-j)==1 && state[i]==state[j] ) {
cov[i,j] = g(sigma[state[i]]);
} else {
cov[i,j] = 0;
}
}
}

}

model {
sigma ~ gamma(3,0.2);
x ~ multi_normal(rep_vector(0,N),cov);
}


Any tips appreciated!

We have some built-in functions for HMMs, if that helps: 29.1 Stan functions | Stan Functions Reference

1 Like

If the latent states are known it’s a regular Markov Model, not a Hidden one.

Anyway, if N is large the slowest part is Cholesky decomposition. That scales as O\left(N^3\right).
The covariance matrix is block diagonal so you could speed up a bit by breaking it into blocks:

...
transformed data {
int n_blocks = 1;
for (k in 2:N) {
if (state[k] != state[k-1]) {
n_blocks += 1;
}
}
int block_sizes[n_blocks] = rep_vector(1, n_blocks);
{ int b = 1;
for (k in 2:N) {
if (state[k] == state[k-1]) {
block_sizes[b] += 1;
} else {
b += 1;
}
} }
}
...
model {
sigma ~ gamma(3, 0.2);

int k = 1;
for (b in 1:n_blocks) {
int s = block_sizes[b];
real f_s = f(sigma[state[k]]);
real g_s = g(sigma[state[k]]);
vector[s] xb = x[k : k + s-1];
k += s;
matrix[s,s] cov;
for (i in 1:s) {
for (j in 1:s) {
if (i == j) {
cov[i,j] = f_s;
} else if (abs(i-j) == 1) {
cov[i,j] = g_s;
} else {
cov[i,j] = 0.0;
}
}
}
xb ~ multi_normal(rep_vector(0.0, s), cov);
}
}


But that’s not all. The blocks are tridiagonal Toeplitz matrices so you can probably find an effcient Cholesky decomposition and use multi_normal_cholesky.

4 Likes

Thank you, I will look into this.

Thank you @nhuurre , this is very helpful and provides the path forward. I just need to learn a bit more about Cholesky decomposition!!