I’m coding up some hidden Markov mark-recapture models. To keep my code as clear as possible for didactic purposes, I am not using the forward algorithm or any convenience functions - I’m marginalizing over all possible states of the mark-recapture process. However, I am not recovering the simulated parameter values, and can’t see why.

In the example I am working on, each individual animal can be in two possible states: dead or alive, given by a 2 \times 2 transmission matrix, where the probability of surviving from time points t-1 to t is \phi_t, and the probability of dying is 1- \phi_t. No zombies are allowed, so individuals stay dead if dead. The data are classic binary capture-recapture histories, where individuals are given a 1 if seen at each time point, or given 0 if they are not seen. For those individuals who are not seen, there are two possibilities: they are dead or they are alive, but just not seen. If individuals are alive, they are seen with probability \delta_t and not seen with probability 1 - \delta_t.

For instance, a capture history of `1 0 0`

means that the animal is alive and seen at t = 1, and that the animal could be alive at time t = 2 to T but just wasn’t detected, or that the animal was alive at t=2 and dead at t=3, or that the animal could have been dead from t=2 to T. In Stan code (not the prettiest, purely for didactic purposes), my intuition is to write this as:

```
vector[2] lpt2;
vector[3] lpt3;
// time 1
target += bernoulli_lpmf(1 | phi) + bernoulli_lpmf(1 | delta);
// time 2
// could be alive but undetected
lpt2[1] = bernoulli_lpmf(1 | phi) + bernoulli_lpmf(0 | delta);
// could be dead and unobserved with prob = 1
lpt2[2] = bernoulli_lpmf(1 | 1 - phi) + bernoulli_lpmf(0 | 1);
target += log_sum_exp(lpt2);
// time 3
// could have been dead at t-1 and still dead
lpt3[1] = bernoulli_lpmf(1 | 1) + bernoulli_lpmf(0 | 1);
// could have been alive at t-1 and now dead
lpt3[2] = bernoulli_lpmf(1 | 1 - phi) + bernoulli_lpmf(0 | 1);
// could have been alive at t-1 and is still alive, but undetected
lpt3[3] = bernoulli_lpmf(1 | phi) + bernoulli_lpmf(0 | delta);
target += log_sum_exp(lpt3);
```

First, for this simple example, is this marginalization correct? From what I have been testing, I don’t think it is.

In the attached R script, I simulate capture-recapture histories for 100 individuals from the above generative model. At t=1, I assume individuals were alive at t-1 = 0. I try to marginalize over all possible states similar to the above.

The main loop in the `model`

block is below, where `TM`

and `EM`

are the transmission and emission matrices. Again, this is not the prettiest or most efficient code, but is being used to show each step as clearly as possible.

```
/* marginalize over the different possible paths for the data */
// for every individual
for(m in 1:n_m){
// p = 1 (assumed alive at p-1)
// animal was observed
if(y[m,1] == 1){
// alive -> alive
target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[2,2]);
}
// if not observed (must be seen later) -> conditional on first capture
if(y[m,1] == 0){
// alive->alive but not detected
target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
}
//for the remaining time periods
for(p in 2:n_p){
// animal observed at p
if(y[m,p] == 1){
// alive->alive
target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[2,2]);
}
// animal not observed at p
if(y[m,p] == 0){
// if not observed at p - 1
if(y[m,p-1] == 0){
// if seen later (i.e. p != n_p)
if(sum(y[m,p:n_p]) > 0){ // can't be dead now
// alive->alive
target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
}
// if not seen later (i.e. p == n_p)
if(sum(y[m,p:n_p]) == 0){ // might be dead now, or at p - 1
vector[3] lp;
// dead->dead
lp[1] = bernoulli_lpmf(1 | TM[1,1]) + bernoulli_lpmf(1 | EM[1,1]);
// alive->dead
lp[2] = bernoulli_lpmf(1 | TM[2,1]) + bernoulli_lpmf(1 | EM[1,1]);
// alive->alive
lp[3] = bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
target += log_sum_exp(lp);
}
}
// observed at p - 1
if(y[m,p-1] == 1){
// if seen later
if(sum(y[m,p:n_p]) > 0){ // can't be dead now
// alive->alive
target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
}
// if not seen later
if(sum(y[m,p:n_p]) == 0){ // might be dead now, but alive at p -1
vector[2] lp;
// alive->dead
lp[1] = bernoulli_lpmf(1 | TM[2,1]) + bernoulli_lpmf(1 | EM[1,1]);
// alive->alive
lp[2] = bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
target += log_sum_exp(lp);
}
}//seen at p-1
}// not seen at p
}//p
}//m
```

Unfortunately, the simulated parameters are not recovered when running this model. Does anyone know where I am going wrong?

Thanks!

mrc-two-state-sim.R (1.6 KB)

mrc-two-state.stan (3.1 KB)