Marginalizing latent state in a mark-recapture model

I’m coding up some hidden Markov mark-recapture models. To keep my code as clear as possible for didactic purposes, I am not using the forward algorithm or any convenience functions - I’m marginalizing over all possible states of the mark-recapture process. However, I am not recovering the simulated parameter values, and can’t see why.

In the example I am working on, each individual animal can be in two possible states: dead or alive, given by a 2 \times 2 transmission matrix, where the probability of surviving from time points t-1 to t is \phi_t, and the probability of dying is 1- \phi_t. No zombies are allowed, so individuals stay dead if dead. The data are classic binary capture-recapture histories, where individuals are given a 1 if seen at each time point, or given 0 if they are not seen. For those individuals who are not seen, there are two possibilities: they are dead or they are alive, but just not seen. If individuals are alive, they are seen with probability \delta_t and not seen with probability 1 - \delta_t.

For instance, a capture history of 1 0 0 means that the animal is alive and seen at t = 1, and that the animal could be alive at time t = 2 to T but just wasn’t detected, or that the animal was alive at t=2 and dead at t=3, or that the animal could have been dead from t=2 to T. In Stan code (not the prettiest, purely for didactic purposes), my intuition is to write this as:

vector[2] lpt2;
vector[3] lpt3;
// time 1
target += bernoulli_lpmf(1 | phi) + bernoulli_lpmf(1 | delta);
// time 2
// could be alive but undetected
lpt2[1] =  bernoulli_lpmf(1 | phi) + bernoulli_lpmf(0 | delta);
// could be dead and unobserved with prob = 1
lpt2[2] = bernoulli_lpmf(1 | 1 - phi) + bernoulli_lpmf(0 | 1);
target += log_sum_exp(lpt2);
// time 3
// could have been dead at t-1 and still dead
lpt3[1] = bernoulli_lpmf(1 | 1) + bernoulli_lpmf(0 | 1);
// could have been alive at t-1 and now dead
lpt3[2] = bernoulli_lpmf(1 | 1 - phi) + bernoulli_lpmf(0 | 1);
// could have been alive at t-1 and is still alive, but undetected
lpt3[3] = bernoulli_lpmf(1 | phi) + bernoulli_lpmf(0 | delta);
target += log_sum_exp(lpt3);

First, for this simple example, is this marginalization correct? From what I have been testing, I don’t think it is.

In the attached R script, I simulate capture-recapture histories for 100 individuals from the above generative model. At t=1, I assume individuals were alive at t-1 = 0. I try to marginalize over all possible states similar to the above.

The main loop in the model block is below, where TM and EM are the transmission and emission matrices. Again, this is not the prettiest or most efficient code, but is being used to show each step as clearly as possible.

/* marginalize over the different possible paths for the data */
  // for every individual
  for(m in 1:n_m){
    // p = 1 (assumed alive at p-1)
    // animal was observed
    if(y[m,1] == 1){
      // alive -> alive
      target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[2,2]);
    }
    // if not observed (must be seen later) -> conditional on first capture
    if(y[m,1] == 0){
      // alive->alive but not detected
      target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
    }

    //for the remaining time periods
    for(p in 2:n_p){
      // animal observed at p
      if(y[m,p] == 1){
        // alive->alive
        target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[2,2]);
      }
      // animal not observed at p
      if(y[m,p] == 0){
        // if not observed at p - 1
        if(y[m,p-1] == 0){
          // if seen later (i.e. p != n_p)
          if(sum(y[m,p:n_p]) > 0){ // can't be dead now
            // alive->alive
            target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
          }
          // if not seen later (i.e. p == n_p)
          if(sum(y[m,p:n_p]) == 0){ //  might be dead now, or at p - 1
            vector[3] lp;
            // dead->dead
            lp[1] = bernoulli_lpmf(1 | TM[1,1]) + bernoulli_lpmf(1 | EM[1,1]);
            // alive->dead
            lp[2] = bernoulli_lpmf(1 | TM[2,1]) + bernoulli_lpmf(1 | EM[1,1]);
            // alive->alive
            lp[3] = bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
            target += log_sum_exp(lp);
          }
        }
        // observed at p - 1
        if(y[m,p-1] == 1){
          // if seen later
          if(sum(y[m,p:n_p]) > 0){ // can't be dead now
            // alive->alive
            target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
          }
          // if not seen later
          if(sum(y[m,p:n_p]) == 0){ //  might be dead now, but alive at p -1
            vector[2] lp;
            // alive->dead
            lp[1] = bernoulli_lpmf(1 | TM[2,1]) + bernoulli_lpmf(1 | EM[1,1]);
            // alive->alive
            lp[2] = bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
            target += log_sum_exp(lp);
          }
        }//seen at p-1
      }// not seen at p
    }//p
  }//m

Unfortunately, the simulated parameters are not recovered when running this model. Does anyone know where I am going wrong?

Thanks!

mrc-two-state-sim.R (1.6 KB)
mrc-two-state.stan (3.1 KB)

2 Likes

Ah, I think I see the problem. I am adding in too many state probabilities (if the animal is dead at t=1, then there is no need to add in the probability of it being dead again at t=2).

EDIT: In fact, this still doesn’t fix the problem. Anyone have any ideas?

1 Like

Just a quick glance + I am not an expert, but if I get it correctly, the unobserved variables you need to margnialize over are the states after the last “capture” time point (let’s call this time l). If I understand your code properly, you are either: 1) independently marginalizing at each step l +1, ..., p over the whole history up to this time (the simple example) or 2) taking each time step l +1, ..., p and marginalizing over the two possibilities for that state independently. (the second example)

But IMHO you should be marginalizing over the p - l + 1 possible times the animal could have died (just after the last capture to after all captures) and for each case add together the log-likelihood for all observations and then “average” via log_sum_exp.

Finally, and possibly most importantly, when you marginalize discrete parameters explicitly, you need parameters in the model that correspond to the probabilities of the individual values unobserved discrete variables. So in the simple example, you would have something like:

parameters {
  simplex[3] time_of_death;
}

model {
 vector[3] lp_individual;
 target += //the log probability of being observed in the first step
 lp[1] = log(time_of_death[1]) + ... //+ log prob of being dead before second observation
 lp[2] = log(time_of_death[2]) + ... // +log prob of being alive for first observation but dead before third observation
 lp[3] = log(time_of_death[3]) + ... // + log prob of being alive for both observations
 target += log_sum_exp(lp);
}

Now implementing this generally would be tricky as for each of the individuals you need a simplex parameter of different size, so you would need to 1) perform the constraining transform to simplex yourself and 2) use transformed data to compute the number of unconstrained parameters you need, declare a single vector of the correct size and then unpack the parameters for each individual from this vector. Certainly not impossible, but unlikely to be clear, didactic code.

I hope I am making sense - please double check my reasoning, I am not an expert on this.

Stan User Guide has a chapter on mark-recapture models. That would be a good place to start.

In principle, marginalizing out discrete parameters isn’t any more complicated than sampling continuous parameters.

When sampling continuous parameters: the sampler assigns proposed values to the parameters and your task in the model block is to compute the joint probability density of that proposal and data.

When marginalizing out a discrete parameter, it’s the same thing: you get a proposed value and compute its probability. The only difference is that

  1. the proposals don’t come from the sampler; you simply loop over every possible value, and
  2. instead of adding the log prob directly to target you store it in an array

The full array is then log_sum_exp()ed and the result is added to target.

Independent parameters can have separate loops. Mutually dependent parameters need nested loops. That’s a bit more involved so an example may clarify.

Let’s say you have discrete parameters x ~ X(a) and y ~ Y(x) (wheremodel1.stan (1.1 KB) a is some hyperparameter) and data d1 ~ D(x) and d2 ~ D(y). Assuming x and y are both in the range 1..S we can marginalize over their joint distribution like so

matrix[S,S] lp;
for (sx in 1:S) {
  for (sy in 1:S) {
    lp[sx,sy] = 
        X_lpmf(sx| a) // prior for x
      + D_lpdf(d1| sx) // data for x
      + Y_lpmf(sy| sx) // prior for y (given x)
      + D_lpdf(d2| sy); // data for y
  }
}
target += log_sum_exp(lp);

There’s some redundancy here. Grouping the terms logically can ensure each is computed only once.

vector[S] lp_x;
vector[S] lp_y;
for (sx in 1:S) {
  lp_x[sx] =
      X_lpmf(sx| a) // prior for x
    + D_lpdf(d1| sx);// data for x
}
for (sy in 1:S) {
  vector[S] lp_yx;
  for (sx in 1:S) {
    lp_yx[sx] =
        lp_x[sx] // posterior probability of x
      + Y_lpmf(sy| sx); // prior for y (given x)
  }
  lp_y[sy] =
      log_sum_exp(lp_yx) // prior for y (with x marginalized out)
    + D_lpdf(d2| sy); // data for y
  }
}
target += log_sum_exp(lp_y);

This is the forward algorithm. If there’s a third variable z that depends on y one can compute its probabilities lp_z using lp_y in exactly the same way one computes above lp_y using lp_x. Each of these lp vectors accumulates all the information in the previous ones; only the last one is added to target.

Now, back to mark-recapture models. It’s possible to represent it as a Markov process and use the forward algorithm but (as Martin mentioned above) a simpler parameterization is a single time-of-death number for each animal.

I wrote a simple model based on this idea (attached model1.stan). However, that model doesn’t account for the censoring that happens when an animal is not seen at all. I also made a model (model2.stan) that should be equivalent to the “individual CJS model” in the User Guide.

model1.stan (1.1 KB) model2.stan (1.3 KB)

2 Likes

@cgoold Great to see some CMR examples on the forum! Hopefully the following is helpful.

Minor thing - here and elsewhere where bernoulli_lpmf(0|1) is written, do you need bernoulli_lpmf(0 | 0) which evaluates to 0 (on a log scale/a probability of 1), rather than bernoulli_lpmf(0 | 1) which evaluates to -Inf (on a log scale/a probability of 0)?

More generally, it looks like the Stan code sketch use for the capture history of 1 0 0 doesn’t quite match my understanding of CMR likelihoods. As you mention, for the capture history 1 0 0 there are three possibilities:

a. The animal is alive the whole time (at t=1, 2, 3),
b. The animal dies at the third timestep (alive/alive/dead), or
c. The animal dies at the second timestep (alive/dead/dead).

Labeling these three possibilities, the likelihood for the capture history 1 0 0 can be written as:

\text{Pr}(1, 0, 0 \mid \phi, \delta) =

\underbrace{\phi \delta}_{\text{Survived and seen t=1}} \overbrace{\times}^{\text{and}} \Big( \underbrace{\phi (1-\delta) \phi (1-\delta)}_{\text{a: survived t=2,3}} \overbrace{+}^{\text{or}} \underbrace{\phi (1-\delta) (1 - \phi)}_{\text{b: survived t=2, died t=3}} \overbrace{+}^{\text{or}} \underbrace{1-\phi}_{\text{c: died t=2}} \Big).

But, in the Stan/pseudocode block at the top of the post, it looks like the target is being incremented by something different. Summing up the increments of the target, and leaving out any terms that evaluate to a probability of one/log probability of zero, I think you’ll get:

target += bernoulli_lpmf(1 | phi) + bernoulli_lpmf(1 | delta) +
  log_sum_exp(
    (
      bernoulli_lpmf(1 | phi) + bernoulli_lpmf(0 | delta), 
      bernoulli_lpmf(1 | 1 - phi)
    ) + 
  log_sum_exp(
    bernoulli_lpmf(1 | 1 - phi),
    bernoulli_lpmf(1 | phi) + bernoulli_lpmf(0 | delta)
  )
)

which would evaluate to:

\log(\phi) + \log(\delta) + \log \big( \phi (1 - \delta) + 1 - \phi \big) + \log \big( 1 - \phi + \phi (1 - \delta) \big).

Exponentiating to get back to probability space gives:

\phi \delta \times \big( \phi (1 - \delta) + 1 - \phi) \big) \times \big( 1 - \phi + \phi(1 - \delta) \big).

Working out the logic of this expression:

\underbrace{\phi \delta}_{\text{Survived and seen t=1}} \overbrace{\times}^{\text{and}} \big( \underbrace{\phi (1 - \delta)}_{\text{Survived t=2}} \overbrace{+}^{\text{or}} \underbrace{1 - \phi}_{\text{Died t=2}}) \big) \overbrace{\times}^{\text{and}} \big( \underbrace{1 - \phi}_{\text{Died t=3}} \overbrace{+}^{\text{or}} \underbrace{\phi(1 - \delta)}_{\text{Survived t=3}} \big),

In other words, this expression seems to imply that one could have died twice (in t=2 and t=3), or have died (in t=2) then survived (in t=3).

If you did want to represent the log probability of the capture history 1 0 0 manually in Stan I think it would be something more like:

// time 1: survived and observed
target += bernoulli_lpmf(1 | phi) + bernoulli_lpmf(1 | delta);

// times 2 and 3
vector[3] lp;
// there are three possibilities:
// 1. alive at t=2, 3 but unobserved
lp[1] = bernoulli_lpmf(1 | phi) + bernoulli_lpmf(0 | delta) +bernoulli_lpmf(1 | phi) + bernoulli_lpmf(0 | delta);
// 2. alive at t=2 and unobserved, but dead at t=3
lp[2] = bernoulli_lpmf(1 | phi) + bernoulli_lpmf(0 | delta) + bernoulli_lpmf(0 | phi);
// 3. dead at t=2 (and still dead at t=3)
lp[3] = bernoulli_lpmf(0 | phi);

target += log_sum_exp(lp);

Note that in the expressions for lp[2] and lp[3] I could have included a term related to the probability of observing a zero, but this probability evaluates to one (or zero on a log scale) so I left it out.

As a practical matter, I’d (like others) consider using the forward algorithm instead, since it is easier to implement and doesn’t require explicit manual consideration of every possible state sequence that could have generated a sequence of observations. Finally, there’s some great Stan CMR examples here too: https://github.com/stan-dev/example-models/tree/master/BPA

3 Likes

@martinmodrak @nhuurre @mbjoseph
Thank you for your replies - they were all really helpful! Hopefully, this thread can be a useful discussion for people approaching marginalizing discrete variables from their models.

I should have said earlier that I have explored the manual section of mark-recapture models, as well as the Github pages, but I was still running into problems. Hence my first post.

@nhuurre
Thanks a lot for spending the time writing out your code example - it’s hugely appreciated. Running your code on my simulated data (that I’ve spent time checking to ensure it is the data generating model I think it is), I am finding that the \phi and \delta parameters are not quite right. Even with > 2000 capture-recapture histories, the estimates are systematically over-estimated (although not by much in the case of \phi). For instance, a run with 2300 individuals, with data generating parameters \phi = 0.6 and \delta = 0.6 results in:

I am only generating 3 data points per individual, however, so it could be that I need more data points.

@nhuurre Sorry, the above was actually wrong. You model2 does recover the true values.

@mbjoseph
Thanks for the super-clear description! It was very helpful, and I was obviously wasn’t marginalizing with respect to the whole observation histories.

I also forgot that I was subsetting the simulated data for cases with at least one observation, and doing so was over-estimating \phi. Changing the code to reflect this constraint means that it works and recovers the parameter values.

Below is the code I used to estimate \phi and \delta from a set of capture-recapture histories which include all zero values, which is useful when implementing the data augmentation approaches used in mark recapture modelling.

for(m in 1:n_m){
    // define last capture; last() gets the last capture point
    int lc = last(y[m,]);

    // if not observed
    if(lc == 0){
      //P(0,0,0) = phi^3(1-delta)^3 + phi^2(1-delta)^2(1-phi) + phi(1-delta)(1-phi) + (1-phi)
      vector[4] lp0;
      lp0[1] = log(phi) + log(1-delta) + log(phi) + log(1-delta) + log(phi) + log(1-delta);
      lp0[2] = log(phi) + log(1-delta) + log(phi) + log(1-delta) + log(1-phi);
      lp0[3] = log(phi) + log(1-delta) + log(1-phi);
      lp0[4] = log(1-phi);
      target += log_sum_exp(lp0);
    }
    else{ 
      // until last capture
      for(i in 1:lc){
        target += log(phi) + bernoulli_lpmf(y[m,i] | delta);
      }
      // after last capture
      if(lc == 1){ // last capture is at time point 1 (1,0,0)
        vector[3] lp1; // three possibilities
        // survived the whole time but went undetected
        lp1[1] = log(phi) + log(1 - delta) + // t2
                 log(phi) + log(1 - delta);  // t3
        // survived to time point 2, dead at t = 3
        lp1[2] = log(phi) + log(1 - delta) + // t2
                  log(1 - phi);              // t3
        // dead at t=2
        lp1[3] = log(1- phi);
        target += log_sum_exp(lp1);
       }
       if(lc == 2){ //last capture at time point 2 (1, 1, 0)
         vector[2] lp2; // two possibilities
         // alive at t=3
         lp2[1] = log(phi) + log(1 - delta);
         // dead at t=3
         lp2[2] = log(1 - phi);
         target += log_sum_exp(lp2);
       }
     }
   }
3 Likes