Multi-scale/Staggered Entry Occupancy model as HMM

Hello,

I’m interested in fitting some amalgamation of a multi-scale/staggered entry/dynamic occupancy models as a hidden Markov Model (following a previous suggestion in the thread here). For the time being at least, the general structure I’m pursuing has a complete data likelihood that looks a little something like this, where i indexes a site, t indexes a temporal occasions, and c indexes a replicate detectors operating during i and t:

z_i~Bernoulli(\psi_i)
logit(\psi_i) = s(space) +\bf\beta X_i
a_{it}~Bernoulli(\theta_{it})
logit(\theta_{it}) = \bf\delta s(time) X_i+\bf\lambda X_{it}+\gamma a_{it-1}
y_{itc}~Bernoulli(z_ia_{it}p_c)
logit(p_c) = \bf\alpha X_c

So, the general idea is that y can = 1 if z_i and a_it =1, and that a_it is a first-order (hidden) Markov model. The smoothers are implemented in the name of efficiency/dimension reduction (although tbh, the program as I’ve written is slow almost certainly because I’ve just hacked away at getting it running). To the extent that I’ve been able to interrogate the problem, treating availability for detection as a first order markov process seems to be more computationally efficient (perhaps because it constrains the state transitions) than a model in which transitions in z_it are of interest.

At any rate, the marginal approach to HMM and the forward algorithm is an exciting but very new way for me to think about these problems. While there are some excellent resources on HMM and dynamic in the manual, in the BPA translations on github, and vignettes focusing on HMM for animal movement models, I’m a little curious about a few issues/questions I haven’t seen fully addressed elsewhere. My interpretation of the transition and emission sort of matrices looks a little like so:

for (c in 1:ndetectors){
           //state, detector # , observation
        po[1, c, 1]=1;     
        po[1, c, 2]=0;
        po[2, c, 1]=1;
        po[2, c, 2]=0;
        po[3, c, 1]=1-p[c];
        po[3, c, 2]=p[c];
      }
      
       for (i in 1:ncells){
         
// ps1[i, 1]=1-psi[i];    
          //ps1[i, 1]=psi[i]*(1-inv_logit(logittheta[i, 1]));
          //ps1[3, i]=psi[i]*inv_logit(logittheta[i, 1]);
            for (t in 1:364){
              ps[1, i, t, 1]=1; //unoccupied remains unoccupied
              ps[1, i, t, 2]=0;
              ps[1, i, t, 3]=0;
              ps[2, i, t, 1]=0;
              ps[2, i, t, 2]=1-inv_logit(logittheta[i, t+1]); //occupied
              ps[2, i, t, 3]=inv_logit(logittheta[i, t+1]);
              ps[3, i, t, 1]=0;
              ps[3, i, t, 2]=1-inv_logit(logittheta[i, t+1]+a_AL);
              ps[3, i, t, 3]=inv_logit(logittheta[i, t+1]+a_AL);
            }
          }
} 

instead of

model{
vector[N] mu = alpha+beta*x;
y~normal(mu,sigma);
}


1 Like

Sorry, the tab/space-bar combo here just really wants to send things through. I’m a creature of google-groups.

The initial state distribution should look like what was commented out below…

for (i in 1:ncells){
         
          ps1[i, 1]=1-psi[i];    
          ps1[i, 1]=psi[i]*(1-theta[i, 1]);
          ps1[3, i]=psi[i]*theta[i, 1];

Anyway, the issue I’m specifically curious about relates to ragged sampling effort, such that in many cases (i by t combinations), there may be no active detectors, or a variable (for simplicity, 1 or 2) number of active detectors. The way I’ve approached this is as so…:


 //likelihood...
      for (i in 1:ncells) {
    // initial state
      if (activedetectors[i, 1]==0){
       //if no observations, just the marginal for the states...
        gam[1, 1] = 1-psi[i];
        gam[1, 2] = psi[i]*(1-theta[i, 1]);
        gam[1, 3] = psi[i]*theta[i, 1];
      }     

      else{
      \\\at the beginning, only 1 detector ever active
      gam[1, 1] =1-psi[i]* po[1, T_idx[i, 1, 1], y[i, 1, 1]];
      gam[1, 2] =psi[i]*(1-theta[i, 1])* po[2, T_idx[i, 1, 1], y[i, 1, 1]];
      gam[1, 3] =psi[i]*theta[i, 1]* po[3, T_idx[i, 1, 1], y[i, 1, 1]];
      }

    for (t in 2:T) {//T= #occasions 
      for (k in 1:3) { //nstates
        for (j in 1:3){ //also nstates
          if (activedetectors[i, t]==0){
                  //if no observed data, just the marginal transition probs
                 acc[j] = gam[t - 1, j] * ps[j, i, t - 1, k];
            }
          else if (activedetectors[i, t]==1){
                 //if observed at one replicate, transition*emission
                acc[j] = gam[t - 1, j] *ps[j, i, t - 1, k]*po[k, T_idx[i, t, 1], y[i, t, 1]];
          }
          else {
            //if observed  at >1 replicate, transition*emission_replicate1*emission_replicate2 ...
            //T_idx denotes which of the c detectors are active at i during t
            acc[j] = gam[t - 1, j] *ps[j, i, t - 1, k]*po[k, T_idx[i, t, 1], y[i, t, 1]]*po[k, T_idx[i, t, 2], y[i, 
                                   t, 2]];
          }
          gam[t, k] = sum(acc);
        }
      }
    }
    target += log(sum(gam[T]));

This is a pretty embarrassingly simple question, but given my very poor understanding of the marginalization here: a) if there are no sampling replicates that could produce observations, \Gamma is just the marginal probability of states i=1…nstates? b) When there are say one or two or more replicate detectors or observers, the forward algorithm proceeds as transition times emission_rep1 times emission_rep2 times…?

Sorry if this is quite confusing! Happy to elaborate on anything here.

Best,

John

1 Like

Eh, I think I’ve solved this, but probably easier to describe and leave as an example using a simpler model structure, like here (https://github.com/stan-dev/example-models/blob/master/BPA/Ch.13/Dynocc2.stan). That’s a two state model (‘occupied’, ‘not occupied’), in which the ‘not occupied’ state is perfectly observed (i.e., a deterministic absence), and in which the occupied state is observed correctly with probability p and incorrectly with probability 1-p. In the code, ps describes the state transitions (the ‘transition matrix’), and po describes the observation probabilities given the state (the emission matrix, as sometimes called).

I’ll just focus on the likelihood, which is so…:

for (i in 1:nsite) {
    real acc[2];
    vector[2] gam[nyear];

    // First year, initial state distribution (occupied =1, unoccupied =2)
    gam[1, 1] = psi1 * po[1, 1, y[i, 1] + 1];
    gam[1, 2] = (1 - psi1) * po[2, 1, y[i, 1] + 1];

    for (t in 2:nyear) {
      for (k in 1:2) {
        for (j in 1:2)
          acc[j] = gam[t - 1, j] * ps[j, t - 1, k] * po[k, t, y[i, t] + 1];
        gam[t, k] = sum(acc);
      }
    }
    target += log(sum(gam[nyear]));
  }
}

Really, I think the large blob of text in my previous posts can be reduced to two questions, that I think I’ve answered with a little reading.

First, the situation in which there is not an observation for a specific time. My understanding is that this
the pr(obs|state), or “po[k, t, y[i, t] + 1]” in this snippet…

or (t in 2:nyear) {
      for (k in 1:2) {
        for (j in 1:2)
          acc[j] = gam[t - 1, j] * ps[j, t - 1, k] * po[k, t, y[i, t] + 1];
        gam[t, k] = sum(acc);
      }
    }

simply needs to be removed–i.e., acc[j] = gam[t - 1, j] * ps[j, t - 1, k]–following eq. 9 in MacKenzie’s 2003 paper.

My second question related to the situation in which there might be two independent observations of the the state, i.e., say, y[i, t, 1], and y[i, t, 2] (this is an essential part of the multi-scale model described earlier). That question is silly in hindsight. In the example code linked to and described here, y[i, t] is a transformation of a binomial count, and so:

  acc[j] = gam[t - 1, j] * ps[j, t - 1, k] * po[k, t, y[i, t] + 1];

could just as well have been written

  acc[j] = gam[t - 1, j] * ps[j, t - 1, k] * po[k, t, y[i, t, 1]]*po[k, t, y[i, t, 2]]*po[k, t, y[i, t, 3]]; //etc.

Where y[i, t, x] is just a single observation. WRT my original question, because the emission or observation probs within a given i and t are completely distinct, this statement here

//if observed  at >1 replicate, transition*emission_replicate1*emission_replicate2 ...
            //T_idx denotes which of the c detectors are active at i during t
            acc[j] = gam[t - 1, j] *ps[j, i, t - 1, k]*po[k, T_idx[i, t, 1], y[i, t, 1]]*po[k, T_idx[i, t, 2], y[i, 
                                   t, 2]];

seems to work fine.

FWIW here’s what I was thinking about a HMM formulation for this multi-scale occupancy model - consider three states (not indexing by site for clarity here, but let t represent sequential sampling occasions):

  • not occupied (s_{t} = 1)
  • occupied not available (s_{t} = 2)
  • occupied available (s_{t} = 3)

The state transition probability matrix ps is 3 \times 3, where the entries contain the probability of transitioning from row i to column j, \text{Pr}(s_{t+1} = j \mid s_t = i):

\Gamma = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1- \alpha & \alpha \\ 0 & \beta & 1 - \beta \end{bmatrix},

e.g., \alpha is the probability of transitioning from not available to available, and \beta is the probability of transitioning from available to not available. It’s not possible to transition from occupied to not or vice versa.

The initial state probability vector would be:

(1 - \psi \quad \psi (1 - \theta) \quad \psi \theta ),

where \psi is the occupancy probability and \theta is the probability of availability in the first sampling ocassion.

Then, the potential observations are:

  • not detected (x_t=1)
  • detected (x_t=2)

Then ps the emission matrix \Omega contains \text{Pr}(x_t = j \mid s_t = i) in row i, column j:

\Omega = \begin{bmatrix} 1 & 0 \\ 1 & 0 \\ 1 - p & p \end{bmatrix},

where p is the probability of detection, given availability.

In Stan this would be:


data {
  int<lower=0> nsite;
  int<lower=0> ntime;
  int<lower=1,upper=2> y[nsite, ntime];
}

parameters {
  real<lower = 0, upper = 1> psi;     // occupancy prob
  real<lower = 0, upper = 1> p;       // detection prob
  real<lower = 0, upper = 1> theta;   // initial availability prob | occupancy
  real<lower = 0, upper = 1> alpha;   // prob transition to available
  real<lower = 0, upper = 1> beta;    // prob transition to not available
}

transformed parameters {
  simplex[3] ps[3];
  simplex[2] po[3];

  // States (s):
  // 1 not occupied
  // 2 occupied not available
  // 3 occupied available
  // ps is the transition probability matrix
  // ps[i, j] is Pr(s_{t+1}=j | s_t = i)
  ps[1, 1] = 1;
  ps[1, 2] = 0;
  ps[1, 3] = 0;
  ps[2, 1] = 0;
  ps[2, 2] = 1 - alpha;
  ps[2, 3] = alpha;
  ps[3, 1] = 0;
  ps[3, 2] = beta;
  ps[3, 3] = 1 - beta;

  // Observations (o):
  // 1 not detected
  // 2 detected
  // po is the emission matrix
  // po[i, j] is Pr(o_t = j | s_t = i)
  po[1, 1] = 1;
  po[1, 2] = 0;
  po[2, 1] = 1;
  po[2, 2] = 0;
  po[3, 1] = 1 - p;
  po[3, 2] = p;
}

model {
  real acc[3];
  vector[3] gamma[ntime];

  for (i in 1:nsite) {
    // first timestep probabilities
    gamma[1, 1] = (1 - psi) * po[1, y[i, 1]];
    gamma[1, 2] = psi * (1 - theta) * po[2, y[i, 1]];
    gamma[1, 3] = psi * theta * po[3, y[i, 1]];
    
    for (t in 2:ntime) {
      for (k in 1:3) {
        for (j in 1:3) {
          acc[j] = gamma[t - 1, j] * ps[j, k] * po[k, y[i, t]];
        }
        gamma[t, k] = sum(acc);
      }
    }
    target += log(sum(gamma[ntime]));
  }
}

And here’s some R code to simulate data, fit the model, and look at the output:

library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

nsite <- 100
ntime <- 20

psi <- .8
theta <- .5
alpha <- .5
beta <- .2
p <- .3


# Simulate states ---------------------------------------------------------

z <- rbinom(nsite, 1, psi)
availability <- matrix(nrow = nsite, ncol = ntime)
availability[, 1] <- rbinom(nsite, 1, z * theta)
for (t in 2:ntime) {
  pr_avail <- z * (availability[, t-1] * (1 - beta) + 
                    (1 - availability[, t-1]) * alpha)
  availability[, t] <- rbinom(nsite, 1, pr_avail)
}
availability


# Simulate observations ---------------------------------------------------

y <- matrix(rbinom(nsite * ntime, 1, availability * p), 
            nrow = nsite)
y


# Fit model ---------------------------------------------------------------

m_init <- stan_model("multilevel-occ-hmm.stan")
m_fit <- sampling(m_init, 
                  data = list(nsite = nsite, 
                              ntime = ntime, 
                              y = y + 1))
pars <- c("psi", "theta", "p", "alpha", "beta")
print(m_fit, pars = pars)
pairs(m_fit, pars = pars)
traceplot(m_fit, pars = pars)

The posterior geometry seems…interesting:

The biggest difference that stands out to me between this HMM formulation and the Stan code you posted originally is that ps (the state transition probability matrix) contains the probabilities of transitioning among states, which is not necessarily the same as \theta (the probability of availability conditional on occupancy) that you outlined above.

As far as dealing with different numbers of active detectors (simultaneous/replicate observations in a period where the state is not changing), what you’ve done seems right.

3 Likes

Good deal, thanks! Pry masked by the messiness of the posting, but I think the models are about the same. Here:

              ps[1, i, t, 1]=1; //unoccupied remains unoccupied
              ps[1, i, t, 2]=0;
              ps[1, i, t, 3]=0;
              ps[2, i, t, 1]=0;
              ps[2, i, t, 2]=1-inv_logit(logittheta[i, t+1]); //not available to not available
              ps[2, i, t, 3]=inv_logit(logittheta[i, t+1]); //not available to available
              ps[3, i, t, 1]=0;
              ps[3, i, t, 2]=1-inv_logit(logittheta[i, t+1]+a_AL); //available to not available
              ps[3, i, t, 3]=inv_logit(logittheta[i, t+1]+a_AL);//available to available

‘a_AL’ is meant to account for the autologistic effect \gamma a_{it-1} referred to in my first post. What I’ve coded as ‘inv_logit(logittheta)’ is equivalent to what you’ve called \alpha (or what I termed \delta s(time)X_i+\lambda X_{it}: the probability of becoming available if a_{it-1}= 0. and ‘1-inv_logit(logittheta+a_AL)’ is equivalent to \beta (or what I originally callled (or 1 - what I termed \delta s(time)X_i+\lambda X_{it} + \gamma a_{it-1}): the probability of becoming unavailable given a_{it-1}= 1. \alpha and \beta is far more elegant :).

edit–I’m treating the initial state as related to alpha and psi, but I think it’s reasonable that the state of every location initiates as unavailable [given immediately previously unavailable] for sampling purposes–this is a hibernating species on the first day of the year in an area with severe winters.

Regarding the geometry…yikes…I guess the replicate observations within i,t is necessary for identifiability here. I’m also treating availability as space and time varying, while p varies across space but not time…which…egh…maybe makes the two processes more separable.

2 Likes

Ah thanks for clarifying what a_AL is - I see now what you did there. I agree the models are the same except for the parameterization for the state transition probabilities.

Much better than enumerating over all possible availability sequences! Good luck with building out your full model.

1 Like