I’m coding up some hidden Markov mark-recapture models. To keep my code as clear as possible for didactic purposes, I am not using the forward algorithm or any convenience functions - I’m marginalizing over all possible states of the mark-recapture process. However, I am not recovering the simulated parameter values, and can’t see why.
In the example I am working on, each individual animal can be in two possible states: dead or alive, given by a 2 \times 2 transmission matrix, where the probability of surviving from time points t-1 to t is \phi_t, and the probability of dying is 1- \phi_t. No zombies are allowed, so individuals stay dead if dead. The data are classic binary capture-recapture histories, where individuals are given a 1 if seen at each time point, or given 0 if they are not seen. For those individuals who are not seen, there are two possibilities: they are dead or they are alive, but just not seen. If individuals are alive, they are seen with probability \delta_t and not seen with probability 1 - \delta_t.
For instance, a capture history of 1 0 0
means that the animal is alive and seen at t = 1, and that the animal could be alive at time t = 2 to T but just wasn’t detected, or that the animal was alive at t=2 and dead at t=3, or that the animal could have been dead from t=2 to T. In Stan code (not the prettiest, purely for didactic purposes), my intuition is to write this as:
vector[2] lpt2;
vector[3] lpt3;
// time 1
target += bernoulli_lpmf(1 | phi) + bernoulli_lpmf(1 | delta);
// time 2
// could be alive but undetected
lpt2[1] = bernoulli_lpmf(1 | phi) + bernoulli_lpmf(0 | delta);
// could be dead and unobserved with prob = 1
lpt2[2] = bernoulli_lpmf(1 | 1 - phi) + bernoulli_lpmf(0 | 1);
target += log_sum_exp(lpt2);
// time 3
// could have been dead at t-1 and still dead
lpt3[1] = bernoulli_lpmf(1 | 1) + bernoulli_lpmf(0 | 1);
// could have been alive at t-1 and now dead
lpt3[2] = bernoulli_lpmf(1 | 1 - phi) + bernoulli_lpmf(0 | 1);
// could have been alive at t-1 and is still alive, but undetected
lpt3[3] = bernoulli_lpmf(1 | phi) + bernoulli_lpmf(0 | delta);
target += log_sum_exp(lpt3);
First, for this simple example, is this marginalization correct? From what I have been testing, I don’t think it is.
In the attached R script, I simulate capture-recapture histories for 100 individuals from the above generative model. At t=1, I assume individuals were alive at t-1 = 0. I try to marginalize over all possible states similar to the above.
The main loop in the model
block is below, where TM
and EM
are the transmission and emission matrices. Again, this is not the prettiest or most efficient code, but is being used to show each step as clearly as possible.
/* marginalize over the different possible paths for the data */
// for every individual
for(m in 1:n_m){
// p = 1 (assumed alive at p-1)
// animal was observed
if(y[m,1] == 1){
// alive -> alive
target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[2,2]);
}
// if not observed (must be seen later) -> conditional on first capture
if(y[m,1] == 0){
// alive->alive but not detected
target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
}
//for the remaining time periods
for(p in 2:n_p){
// animal observed at p
if(y[m,p] == 1){
// alive->alive
target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[2,2]);
}
// animal not observed at p
if(y[m,p] == 0){
// if not observed at p - 1
if(y[m,p-1] == 0){
// if seen later (i.e. p != n_p)
if(sum(y[m,p:n_p]) > 0){ // can't be dead now
// alive->alive
target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
}
// if not seen later (i.e. p == n_p)
if(sum(y[m,p:n_p]) == 0){ // might be dead now, or at p - 1
vector[3] lp;
// dead->dead
lp[1] = bernoulli_lpmf(1 | TM[1,1]) + bernoulli_lpmf(1 | EM[1,1]);
// alive->dead
lp[2] = bernoulli_lpmf(1 | TM[2,1]) + bernoulli_lpmf(1 | EM[1,1]);
// alive->alive
lp[3] = bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
target += log_sum_exp(lp);
}
}
// observed at p - 1
if(y[m,p-1] == 1){
// if seen later
if(sum(y[m,p:n_p]) > 0){ // can't be dead now
// alive->alive
target += bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
}
// if not seen later
if(sum(y[m,p:n_p]) == 0){ // might be dead now, but alive at p -1
vector[2] lp;
// alive->dead
lp[1] = bernoulli_lpmf(1 | TM[2,1]) + bernoulli_lpmf(1 | EM[1,1]);
// alive->alive
lp[2] = bernoulli_lpmf(1 | TM[2,2]) + bernoulli_lpmf(1 | EM[1,2]);
target += log_sum_exp(lp);
}
}//seen at p-1
}// not seen at p
}//p
}//m
Unfortunately, the simulated parameters are not recovered when running this model. Does anyone know where I am going wrong?
Thanks!
mrc-two-state-sim.R (1.6 KB)
mrc-two-state.stan (3.1 KB)