Identifiability issues when scaling up animal movement HMM to multiple individuals

I’m attempting to model animal movement based on a regular time series of GPS and accelerometer measurements from GPS-collared terrestrial animals. I have a total of 21 tracks (track being a regular time-series without any gaps representing a single deployment on a single animal). I’m basing a lot of my code on @vianeylb and Theo Michelot’s “Intro to Animal Movement Modelling…”, with some differences. For example, at this stage I am modelling each track with stationary parameters. That is for each animal I have single constant transition probability matrix per track and for the emission probabilities of each state I have a single constant parameter set for each distribution of the observed quantities. My observed quantities are 1) an “activity” measurement based on a summary of the accelerometer data which is bounded between 0 and 1 and is modeled as Beta distributed; and 2) step length which the distance traveled in meters and is bounded to be positive and is modeled as Gamma distributed. I am only modelling two states.

So this is a pretty simple model at this point. For each track, I am estimating 10 parameters: the constant probability of remaining in state 1 or in state 2 (2 parameters), the logit of the mean of each Beta distribution for each state (2 parameters), the log of the precision of each Beta distribution (2 parameters) (Note: I’m following Cribari-Neto and Zeilis for the Beta specification), the log the Gamma mean (2 parameters) and the log the Gamma standard deviation (2 parameters). However, I am anticipating a hierarchical model for the 21 tracks. For example, I will eventually have a hyper Beta logit-mean distribution and each individual track’s Beta logit-mean would be drawn from this hyper distribution.

I have modeled code that is running well for fitting each track on it’s own. By virtue of using an ordered type for the Beta logit mean, and the fact that my data are nicely behaved and seem pretty easy to separate, I’m not really running into any identifiability issues as outlined in @betanalpha’s Identifying Mixture Models vignette. Only the Beta logit mean is of the ordered type for two reasons, 1) for almost every animals there is pretty strong signal of two modes in the activity measurement and 2) while shorter step-lengths tend to be associated with smaller activity values because of GPS error I do not have a step-length observation for each point in the time-series and I will sometimes have large values of step-length even with small activity values.

However, when I attempt to fit what is essentially the same model but run for multiple individuals all in one shot, I run into identifiability issues where one or more chains gets stuck on a different solution. For example, I’ve attached the traceplot for an example I ran where I am attempting to fit a model to four individual tracks all in one go using 3 chains.

For track 1 (beta_mu_logit[1,1] and beta_mu_logit[1,2]) chains 2 and 3 converge on the same solution and this solution agrees with the estimates I get when I run my single-track model for this track. Chain 1 on the otherhand is getting stuck on a solution that finds basically no difference in this parameter between the two states. Weirdly, it does not seem to be respecting the ordered type for this chain: the c_summary for chain 1 give the following means:

parameter           mean        
beta_mu_logit[1,1] -1.5542648 
beta_mu_logit[1,2] -1.5529320

But for track 2, (beta_mu_logit[1,1] and beta_mu_logit[1,2]) only chain 3 finds a solution that agrees with my estimates from the single track model. Chains 1 and 2 once again hone in a solution that fits essentially no difference between the states. For track 3, it’s chain 2 that is off, but chain 1 and 3 are correct, and finally for track 4 it’s chain 3 that is off. In all cases the chain that is misbehaving is essentially fitting a constant value for each of the two states.

I’m wondering if I’m running to an issue by using an array of ordered type in my multiple individual model. I’m confident that the log-likelihood math for each individual track is identical between the two model. At this point I’m fitting everything separately so information shouldn’t be shared between tracks (I’d like to go there, but I need to solve this issue first!). What can I try here?


Below is model code comparing the “single-track” model to the multiple individual model. Unfortunately, I cannot share the data at this point.

data block
Single Track:

data {
  int<lower=0>                        T; // length of complete time series
  int<lower=0, upper = T>             T_obs_stp; // number of observed step lengths
  int<lower=1, upper = T>             index_obs_stp[T_obs_stp];
  vector<lower = 0>[T_obs_stp]        obs_steps;
  vector<lower = 0, upper = 1>[T]     activity;
}

Multiple Track:

data {
  int<lower = 1>                      M; //number of individuals
  int<lower=0>                        T; // length of complete time series
  int<lower = 1, upper = M>           ID[T]; //track identifier
  int<lower=0, upper = T>             T_obs_stp; // number of observed step lengths
  int<lower=1, upper = T>             index_obs_stp[T_obs_stp];
  vector<lower = 0>[T_obs_stp]        obs_steps;
  vector<lower = 0, upper = 1>[T]     activity;
}

parameter block
Single Track:

parameters {
  ordered[2]                       beta_mu_logit; // logit mean of beta dist
  vector[2]                        beta_phi_log; // log precision of beta dist
  ordered[2]                       gamma_mu_log; // log mean of gamma
  ordered[2]                       gamma_sigma_log; // log SD of gamma
  vector<lower = 0, upper = 1>[2]  maintain; // probability of remaining in current state
}

Multiple Track:

parameters {
  ordered[2]                       beta_mu_logit[M]; // logit mean of beta dist
  vector[2]                        beta_phi_log[M]; // log precision of beta dist
  ordered[2]                       gamma_mu_log[M]; // log mean of gamma
  ordered[2]                       gamma_sigma_log[M]; // log SD of gamma
  vector<lower = 0, upper = 1>[2]  maintain[M]; // probability of remaining in current state
}

transformed parameters block
Single Track:

transformed parameters {
  vector<lower = 0>[2]                      alpha;
  vector<lower = 0>[2]                      beta; 
  vector<lower = 0>[2]                      shape;
  vector<lower = 0>[2]                      rate;
  matrix[2, 2]                              trans_mat;
  simplex[2]                                statdist; //stationary distribution
  
  for(k in 1:2){
    real beta_mean;
    real beta_phi;
    real gamma_mu;
    real gamma_sigma;
    
    beta_mean   = inv_logit(beta_mu_logit[k]);
    beta_phi    = exp(beta_phi_log[k]);
    gamma_mu    = exp(gamma_mu_log[k]);
    gamma_sigma = exp(gamma_sigma_log[k]);   
    
    alpha[k]  = beta_mean * beta_phi;
    beta[k]   = (1 - beta_mean) * beta_phi;
    
    shape[k] = square(gamma_mu)/square(gamma_sigma);
    rate[k]  = gamma_mu/square(gamma_sigma);
  }

  trans_mat[1, 1] = maintain[1];
  trans_mat[1, 2] = 1 - maintain[1];
  trans_mat[2, 1] = 1 - maintain[2];
  trans_mat[2, 2] = maintain[2];
    
  statdist = to_vector(rep_row_vector(1, 2) / 
             (diag_matrix(rep_vector(1, 2)) - 
             trans_mat + rep_matrix(1, 2, 2)));
}

Multiple Track:

transformed parameters {
  vector<lower = 0>[2]                 alpha[M]; 
  vector<lower = 0>[2]                 beta[M]; 
  vector<lower = 0>[2]                 shape[M];
  vector<lower = 0>[2]                 rate[M];
  matrix[2, 2]                         trans_mat[M]; //matrix of row simplex gammas
  simplex[2]                           statdist[M]; //stationary distribution

  for (m in 1:M){
    for(k in 1:2){
      real beta_mean;
      real beta_phi;
      real gamma_mu;
      real gamma_sigma;
      
      beta_mean        = inv_logit(beta_mu_logit[m, k]);
      beta_phi         = exp(beta_phi_log[m, k]);
      alpha[m, k]      = beta_mean * beta_phi;
      beta[m, k]       = (1 - beta_mean) * beta_phi;
      
      gamma_mu         = exp(gamma_mu_log[m, k]);
      gamma_sigma      = exp(gamma_sigma_log[m, k]);
      shape[m, k]      = square(gamma_mu)/square(gamma_sigma);
      rate[m, k]       = gamma_mu/square(gamma_sigma);
    }
  }

  for (m in 1:M){
    trans_mat[m, 1, 1] = maintain[m, 1];
    trans_mat[m, 1, 2] = 1 - maintain[m, 1];
    trans_mat[m, 2, 1] = 1 - maintain[m, 2];
    trans_mat[m, 2, 2] = maintain[m, 2];
  }
  
  for (m in 1:M)  
    statdist[m] = to_vector(rep_row_vector(1, 2) / 
               (diag_matrix(rep_vector(1, 2)) - 
               trans_mat[m] + rep_matrix(1, 2, 2)));

}

model block
Single Track:

model {
  vector[2] logp;
  vector[2] logp_1;
  vector[2] log_trans_mat_tr[2];
  int       obs_index_now;

  //priors
  beta_mu_logit[1] ~ normal(-3, 0.5);
  beta_mu_logit[2] ~ normal(0, 1);
  beta_phi_log[1]  ~ normal(1, 1);
  beta_phi_log[2]  ~ normal(0, 1);
  gamma_mu_log[1]  ~ normal(3, 1);
  gamma_mu_log[2]  ~ normal(5, 1);
  gamma_sigma_log[1]  ~ normal(3, 1);
  gamma_sigma_log[2]  ~ normal(5, 1);



  for(i in 1:2)
    for(j in 1:2)
      log_trans_mat_tr[j, i] = log(trans_mat[i, j]);
  
  if (index_obs_stp[1] == 1) {
    for (k in 1:2)
      logp[k] = log(statdist[k]) + gamma_lpdf(obs_steps[1] | shape[k], rate[k]) +
                beta_lpdf(activity[1] | alpha[k], beta[k]);
    
    obs_index_now = 2;
  } else {
    for (k in 1:2)
      logp[k] = log(statdist[k]) + beta_lpdf(activity[1] | alpha[k], beta[k]);
  
    obs_index_now = 1;
  }

  for (t in 2:T){
    if (index_obs_stp[obs_index_now] == t){
      for (k in 1:2)
        logp_1[k] = log_sum_exp(log_trans_mat_tr[k] + logp) + 
                    gamma_lpdf(obs_steps[obs_index_now] | shape[k], rate[k])  +
                    beta_lpdf(activity[t] | alpha[k], beta[k]);
      
      obs_index_now += 1;
    } else {
      for (k in 1:2)
        logp_1[k] = log_sum_exp(log_trans_mat_tr[k] + logp) + 
                    beta_lpdf(activity[t] | alpha[k], beta[k]);
    }
    logp = logp_1;
  }
  target += log_sum_exp(logp);
}

Multiple Track:

model {
  vector[2]    logp;
  vector[2]    logp_temp;
  matrix[2, 2] log_trans_mat_transpose[M];
  int          obs_index_now = 1;

  for (m in 1:M){
    beta_mu_logit[m, 1] ~ normal(-3, 0.5);
    beta_mu_logit[m, 2] ~ normal(0, 1);
    beta_phi_log[m, 1]  ~ normal(1, 1);
    beta_phi_log[m, 2]  ~ normal(0, 1);
    gamma_mu_log[m, 1] ~ normal(3, 1);
    gamma_mu_log[m, 2] ~ normal(5, 1);
    gamma_sigma_log[m, 1]  ~ normal(3, 1);
    gamma_sigma_log[m, 2]  ~ normal(5, 1);
  }


  for (m in 1:M)
    for (i in 1:2)
      for (j in 1:2)
        log_trans_mat_transpose[m, j, i] = log(trans_mat[m, i, j]);
  
  for(t in 1:T){
    //initialise forward variable if first obs per track
    if (t == 1 || ID[t] != ID[t - 1]){
      for (k in 1:2)
        logp[k] = log(statdist[ID[t], k]) + beta_lpdf(activity[t] | alpha[ID[t], k], beta[ID[t], k]);
        
      if (index_obs_stp[obs_index_now] == t) {
        for (k in 1:2)
          logp[k] += gamma_lpdf(obs_steps[obs_index_now] | shape[ID[t], k], rate[ID[t], k]);
      
        obs_index_now += 1;
      }
    } else {
      for (k in 1:2)
        logp_temp[k] = log_sum_exp(log_trans_mat_transpose[ID[t], k]' + logp) + 
                       beta_lpdf(activity[t] | alpha[ID[t], k], beta[ID[t], k]); 
        
        if (index_obs_stp[obs_index_now] == t) {
          for (k in 1:2)
            logp_temp[k] += gamma_lpdf(obs_steps[obs_index_now] | shape[ID[t], k], rate[ID[t], k]);
      
          obs_index_now += 1;
        }  
        
      logp = logp_temp; 
    }
    
    if (t == T || ID[t + 1] != ID[t])
      target += log_sum_exp(logp);
  }
}

Nevermind. Error in the initial conditions that eluded me for too long.

2 Likes

FYI: There’s a ‘mistake’ in mine and Théo’s write-up at the very end in the local state decoding part. The stateProbs won’t be between 0 and 1 because we forgot that Stan drops the normalizing constants. You’ll need to scale the stateProbs matrix so that the rows sum to 1.

@LuisDamiano has some great HMM code as well here: https://github.com/luisdamiano/gsoc17-hhmm
He did the local state decoding correctly!

1 Like

Ha! I thought that might be the case as I just discovered that a couple of days ago. I figured it had something to do to with the normalization. I’m using the Viterbi algorithm instead. Mostly, I’m using this HMM to extract the state assignments to then pass into a linear SSM model to handle the observation error (GPS) end of things. I would love if I could handle the state switching and observation all at once in a single model, but from what I’ve found in the literature such switching state-space models don’t have closed form likelihood and so can’t be done in Stan. Nimble or JAGS might be able to handle it conceptually, but not computationally.

Ha! I thought that might be the case as I just discovered that a couple of days ago. I figured it had something to do to with the normalization.

Exactly!

The HMM likelihood is also not available in closed form, but can be programmed into Stan. What’s the error structure like?

One resource I’ve looked at is this paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.639.5702&rep=rep1&type=pdf

But to more briefly describe the simplest form of the problem, consider observations y_t, where:

y_t|S_t = \alpha_t + \epsilon_t, \epsilon_t \sim N(0, H_s) where H_s is the observation error associated with state s
\alpha_{t + 1}|S_t = \alpha_t + \eta_t, \eta_t \sim N(0, Q_s) where Q_s is the process error associated with state s
Pr(S_t = j|S_t = i) = \gamma_{i,j}

if there is only one state, s, then y and \alpha are just a simple linear state space model which can be solved with Kalman filter; if there is no observation error, (the \epsilon's) then \alpha and S is just a simple hidden Markov model (because we just work with the differences \eta).