Speeding up inference on ARHMM


I am playing around with an auto regressive HMM with continuous emissions and incomplete data. Unfortunately the simple code I have, mainly adapted from the HMM example in the docs, is hopelessly slow.

I would appreciate any feedback on how I might be able to boost the performance. Any hints about using parameter and data transformations would be great. I am new to Stan and MCMC in general and I am lacking the expert intuition.

I am including a pystan script of the model and data generation.

pystan_arhmm_ge.py (2.5 KB)


I’m pasting the model in here (with auto-formatting from our emacs mode and reduced vertical space):

data {
  int<lower=1> T;  // num steps
  int<lower=1> K;  // num categories
  int<lower=1> D;  // dim of obs
  matrix[T, D] x; // obs
  vector<lower=0>[K] alpha;  // init category prior
  vector[D] lambda;  // init emit prior
  cov_matrix[D] rho;
  vector<lower=0>[K] beta;  // transit prior
  int nu;  // prior over covariances
  cov_matrix[D] kappa;
parameters {
  simplex[K] pi; // init category
  vector[D] imu[K]; // init emission 
  cov_matrix[D] isigma[K];
  simplex[K] trans[K];  // transit probs
  matrix[D, D] mat[K]; // emit prob matrix
  cov_matrix[D] sigma[K];
model {
  pi ~ dirichlet(alpha);
  for (k in 1:K) {
    imu[k] ~ multi_normal(lambda, rho);
    isigma[k] ~ inv_wishart(nu, kappa);
  for (k in 1:K)
    trans[k] ~ dirichlet(beta);
  for (k in 1:K)
    sigma[k] ~ inv_wishart(nu, kappa);
    // forward algorithm computes log p(x|...)
    real aux[K];
    real gamma[T, K];
    vector[D] mu;

    for (k in 1:K)
      gamma[1, k] = multi_normal_lpdf(x[1,] | imu[k], isigma[k]) + log(pi[k]);
    for (t in 2:T) {
      for (k in 1:K) {
        for (j in 1:K)
          aux[j] = gamma[t - 1, j] + log(trans[j, k]);
        mu = mat[k,,] * x[t - 1]';
        gamma[t, k] = log_sum_exp(aux) + multi_normal_lpdf(x[t] | mu', sigma[k]);
    target += log_sum_exp(gamma[T]);

This will never be lightning fast given the multivariate normal emissions.

What you can do to speed things up is try to precompute and store things like cholesky factors of covariance matrices for the multivariate normals. For example, each of the sigma[k] should be Choleksy factored and you should get an L_sigma[k] which can be reused for all T. That’ll go a long way to speeding things up.

I’d also get rid of the inverse Wishart and use our decomposition into cholesky factors of correlation matrices and scales (described in the manual chapter on regression).

Then you want to work on vectorizing things like this:

for (k in 1:K) {
    imu[k] ~ multi_normal(lambda, rho);

which only requires you to replace the loop with

imu ~ multi_normal(lambda, rho);

You’re probably then going to need the multivariate non-centered parameterization here to get decent mixing. The manual also describes that.

This is a complicated model to speed up!