# Modeling covariates in mark-recapture HMM

Hi,

I am attempting to add continuous covariates to the mark-recapture multi-state model like the one in Chapter 9 of Marc Kery’s BPA. The code that I have copied below can be found here [HMM code].

// -------------------------------------------------
// States (S):
// 1 alive at A
// 2 alive at B
// 3 alive at C
// Observations (O):
// 1 seen at A
// 2 seen at B
// 3 seen at C
// 4 not seen
// -------------------------------------------------

functions {
/**
* Return an integer value denoting occasion of first capture.
* This function is derived from Stan Modeling Language
* User's Guide and Reference Manual.
*
* @param y         Observed values
* @return Occasion of first capture
*/
int first_capture(array[] int y_i) {
for (k in 1 : size(y_i)) {
if (y_i[k] != 4) {
return k;
}
}
return 0;
}

/**
* Return a simplex such as follows (thanks to Bob Carpenter):
* p[1] <- exp(lp[1]) / (1.0 + exp(lp[1]) + exp(lp[2]));
* p[2] <- exp(lp[2]) / (1.0 + exp(lp[1]) + exp(lp[2]));
* p[3] <- 1.0 - p[1] - p[2];
*
* @param lp   N-dimension vector
* @return (N+1)-simplex of given vector and 0
*/
vector softmax_0(vector lp) {
vector[num_elements(lp) + 1] lp_temp;

lp_temp[1 : num_elements(lp)] = lp;
lp_temp[num_elements(lp) + 1] = 0;
return softmax(lp_temp);
}
}
data {
int<lower=0> nind;
int<lower=0> n_occasions;
array[nind, n_occasions] int<lower=1, upper=4> y;
}
transformed data {
int n_occ_minus_1 = n_occasions - 1;
array[nind] int<lower=0, upper=n_occasions> first;

for (i in 1 : nind) {
first[i] = first_capture(y[i]);
}
}
parameters {
real<lower=0, upper=1> phiA; // Survival probability at site A
real<lower=0, upper=1> phiB; // Survival probability at site B
real<lower=0, upper=1> phiC; // Survival probability at site C
real<lower=0, upper=1> pA; // Recapture probability at site A
real<lower=0, upper=1> pB; // Recapture probability at site B
real<lower=0, upper=1> pC; // Recapture probability at site C
vector[2] lpsiA; // Logit of movement probability from site A
vector[2] lpsiB; // Logit of movement probability from site B
vector[2] lpsiC; // Logit of movement probability from site C
}
transformed parameters {
simplex[3] psiA; // Movement probability from site A
simplex[3] psiB; // Movement probability from site B
simplex[3] psiC; // Movement probability from site C
array[4, nind, n_occ_minus_1] simplex[4] ps;
array[4, nind, n_occ_minus_1] simplex[4] po;

// Constrain the transitions such that their sum is < 1
psiA = softmax_0(lpsiA);
psiB = softmax_0(lpsiB);
psiC = softmax_0(lpsiC);

// Define state-transition and observation matrices
for (i in 1 : nind) {
// Define probabilities of state S(t+1) given S(t)
for (t in 1 : n_occ_minus_1) {
ps[1, i, t, 1] = phiA * psiA[1];
ps[1, i, t, 2] = phiA * psiA[2];
ps[1, i, t, 3] = phiA * psiA[3];
ps[1, i, t, 4] = 1.0 - phiA;
ps[2, i, t, 1] = phiB * psiB[1];
ps[2, i, t, 2] = phiB * psiB[2];
ps[2, i, t, 3] = phiB * psiB[3];
ps[2, i, t, 4] = 1.0 - phiB;
ps[3, i, t, 1] = phiC * psiC[1];
ps[3, i, t, 2] = phiC * psiC[2];
ps[3, i, t, 3] = phiC * psiC[3];
ps[3, i, t, 4] = 1.0 - phiC;
ps[4, i, t, 1] = 0.0;
ps[4, i, t, 2] = 0.0;
ps[4, i, t, 3] = 0.0;
ps[4, i, t, 4] = 1.0;

// Define probabilities of O(t) given S(t)
po[1, i, t, 1] = pA;
po[1, i, t, 2] = 0.0;
po[1, i, t, 3] = 0.0;
po[1, i, t, 4] = 1.0 - pA;
po[2, i, t, 1] = 0.0;
po[2, i, t, 2] = pB;
po[2, i, t, 3] = 0.0;
po[2, i, t, 4] = 1.0 - pB;
po[3, i, t, 1] = 0.0;
po[3, i, t, 2] = 0.0;
po[3, i, t, 3] = pC;
po[3, i, t, 4] = 1.0 - pC;
po[4, i, t, 1] = 0.0;
po[4, i, t, 2] = 0.0;
po[4, i, t, 3] = 0.0;
po[4, i, t, 4] = 1.0;
}
}
}
model {
array[4] real acc;
array[n_occasions] vector[4] gamma;

// Priors
// Survival and recapture: uniform
// Uniform priors are implicitly defined.
//  phiA ~ uniform(0, 1);
//  phiB ~ uniform(0, 1);
//  phiC ~ uniform(0, 1);
//  pA ~ uniform(0, 1);
//  pB ~ uniform(0, 1);
//  pC ~ uniform(0, 1);

// Normal priors on logit of all but one transition probs
lpsiA ~ normal(0, sqrt(1000));
lpsiB ~ normal(0, sqrt(1000));
lpsiC ~ normal(0, sqrt(1000));

// Likelihood
// Forward algorithm derived from Stan Modeling Language
// User's Guide and Reference Manual
for (i in 1 : nind) {
if (first[i] > 0) {
for (k in 1 : 4) {
gamma[first[i], k] = y[i, first[i]] == k;
}

for (t in (first[i] + 1) : n_occasions) {
for (k in 1 : 4) {
for (j in 1 : 4) {
acc[j] = gamma[t - 1, j] * ps[j, i, t - 1, k]
* po[k, i, t - 1, y[i, t]];
}
gamma[t, k] = sum(acc);
}
}
target += log(sum(gamma[n_occasions]));
}
}
}


I am new to STAN and Bayesian modeling in general so I am not quite sure how to proceed. I want to start by modeling movement with a covariate (e.g., length), however, I have struggled to implement this. I have tried something like this:

 for (i in 1:nind) {
// Update psi vectors based on length  for individual i
psiA = softmax_0(lpsiA + b1 * lengths[i] );
psiB = softmax_0(lpsiB + b1 * lengths[i] );
psiC = softmax_0(lpsiC + b1 * lengths[i] );
}


The initial value was rejected in this case. Any guidance on where to start would be massively appreciated.

Thanks,

Alex

It’s a bit trickier to relate the movement probabilities to covariates than it is to relate the survival and detection parameters to covariates because survival and detection are simple probabilities, so it’s easy to simply use a logit link. E.g.

phi[i] = inv_logit(b0 + b1 * x[i]);


But with movement probabilities you’ve got a simplex, so you’re looking at multi-logit approaches. Looking at your code though, it looks like you’ve got that covered with the softmax_0 function.

I don’t see priors or parameter declarations for b1 in what you posted, but just looking at your code my suspicion is that this is what’s causing you trouble:

  // Normal priors on logit of all but one transition probs
lpsiA ~ normal(0, sqrt(1000));
lpsiB ~ normal(0, sqrt(1000));
lpsiC ~ normal(0, sqrt(1000));


Those are exceptionally diffuse priors on the multi-logit scale and I suspect they’re not playing nicely with your softmax_0 function. That could be why you’re having trouble with initial values. Try this in R see what kind of values come up:

  x = c(0, rnorm(n, mean = 0, sd = 100))
exp(x -  max(x) + log(sum(exp(x - max(x)))))


Those priors are almost always going to result in one psi term being 1 and the others being 0.

3 Likes

@Dalton Thanks for the response! The code I posted is what I found from here:

Sorry I thought I hyperlinked it. My question was not clear enough, but generally I am wondering where I would specify the covariates? I know I’ll need to add parameters and priors for the coefficients but I am wondering if the code snippet I proposed for including a covariate within softmax is appropriate. I was kind of guessing with that specification.

My question was not clear enough, but generally I am wondering where I would specify the covariates?

Forgive me if I’m being too literal. You’ll need to add to the data, parameters, and transformed parameters block. So following the logic of you examples the data block would get something like: vector[nind] lengths. But how you handle the parameters and transformed parameters depends on your goals and hypotheses. The way you sketched it out you basically have two “intercepts” (lpsi) and a single slope term that is shared between them. I think you’d need to clean up your array declarations and indexing (the psi simplexes look like they should be length nind arrays of simplexes). After that, this should work, but it might not be what you intend.

With that specification you’re saying that the additive log-ratio increases or decreases at the same rate with increasing lengths. That may or may not be realistic, especially since the reference level is constant. You can think of your lpsi parameters as extensions of log-odds where instead of \mathrm{log}(\frac{p}{1-p}). With your softmax_0 function always appending a zero to the end you’ve got them set up as effectively: \{\mathrm{log}(\frac{\psi_{X,A}}{\psi_{X,C}}),\mathrm{log}(\frac{\psi_{X,B}}{\psi_{X,C}}) \} where \psi_{X,C} is the probability of moving from the current state C. So you might see where that could cause issues, if for example, staying in the current state is more likely than moving at all for all three states.

A different way to consider specifying covariates for the transition matrices is to use a series of conditional logits. This can work well and also be bit more intuitive. For example with 3 states {A,B,C} you might setup your parameters and transformed parameters with something like:

parameters{
...
array[3] real b_0_stay; // or simply real b_0_stay for same probability of staying in each state
array[3] real b_1_stay; // or simply real b_1_stay
array[3] real b_0_move; // or simply real b_0_stay for same probability of staying in each state
array[3] real b_1_move; // or simply real b_1_stay
...}

transformed parameters{
...
array[nind, 3] real<lower =0, upper =1> p_stay;
array[nind, 3] real<lower =0, upper =1> p_move;
// for me it's more straightforward to omit the psi simplexes in this setup
array[4, nind, n_occ_minus_1] simplex[4] ps;
...

for (i in 1 : nind) {
for (k in 1:3){
p_stay[i, k] = inv_logit(b_0_stay[k] + b_1_stay[k] * x[i]);
p_move[i, k] = inv_logit(b_0_move[k] + b_1_move[k] * x[i]);

}

// Define probabilities of state S(t+1) given S(t)
for (t in 1 : n_occ_minus_1) {
ps[1, i, t, 1] = phiA * p_stay[1];
ps[1, i, t, 2] = phiA * (1 - p_stay[1]) * p_move[1]; //probability of moving one state
ps[1, i, t, 3] = phiA * (1 - p_stay[1]) * (1 - p_move[1]); // probability of moving two states
ps[1, i, t, 4] = 1.0 - phiA;
ps[2, i, t, 1] = phiB * (1 - p_stay[2]) * p_move[2];
ps[2, i, t, 2] = phiB * p_stay[2];
ps[2, i, t, 3] = phiB * (1 - p_stay[2]) * (1 - p_move[2]);
ps[2, i, t, 4] = 1.0 - phiB;
ps[3, i, t, 1] = phiC * (1 - p_stay[3]) * (1 - p_move[3]);
ps[3, i, t, 2] = phiC *  (1 - p_stay[3]) * p_move[3];
ps[3, i, t, 3] = phiC * p_stay[3];
ps[3, i, t, 4] = 1.0 - phiC;
ps[4, i, t, 1] = 0.0;
ps[4, i, t, 2] = 0.0;
ps[4, i, t, 3] = 0.0;
ps[4, i, t, 4] = 1.0;
...
}



But like I said, it depends on your goals and hypotheses. Consider the number of states and what they represent. How should length increase or decrease movement, etc.

1 Like

Oh, okay. This is helpful for me to think about. I should mention that the model converges with my data without covariates. I am interested in a few different covariates and I think they should determine whether these fish move from one state to the next (the states represent different rivers) or stay in the current state. I will try this implementation and post the code here.

Just going to leave this here for anyone that needs a starting point in the future. Thanks to @Dalton I was able to modify my code and confront the model with some data. This model converges without errors as far as I can tell.
// -------------------------------------------------
// States (S):
// 1 alive at A
// 2 alive at B
// 3 alive at C
// Observations (O):
// 1 seen at A
// 2 seen at B
// 3 seen at C
// 4 not seen
// -------------------------------------------------

functions {
/**

• Return an integer value denoting occasion of first capture.
• This function is derived from Stan Modeling Language
• User’s Guide and Reference Manual.
• @param y Observed values
• @return Occasion of first capture
*/
int first_capture(array int y_i) {
for (k in 1 : size(y_i)) {
if (y_i[k] != 4) {
return k;
}
}
return 0;
}

/**

• Return a simplex such as follows (thanks to Bob Carpenter):
• p[1] ← exp(lp[1]) / (1.0 + exp(lp[1]) + exp(lp[2]));
• p[2] ← exp(lp[2]) / (1.0 + exp(lp[1]) + exp(lp[2]));
• p[3] ← 1.0 - p[1] - p[2];
• @param lp N-dimension vector
• @return (N+1)-simplex of given vector and 0
*/
vector softmax_0(vector lp) {
vector[num_elements(lp) + 1] lp_temp;
lp_temp[1 : num_elements(lp)] = lp;
lp_temp[num_elements(lp) + 1] = 0;
return softmax(lp_temp);


}
}
data {
int<lower=0> nind;
int<lower=0> n_occasions;
array[nind, n_occasions] int<lower=1, upper=4> y;

// Additional covariate inputs


array[n_occasions] real Y;
array[nind] real X;

}
transformed data {
int n_occ_minus_1 = n_occasions - 1;
array[nind] int<lower=0, upper=n_occasions> first;

for (i in 1 : nind) {
first[i] = first_capture(y[i]);
}
}
parameters {
real<lower=0, upper=1> phiA; // Survival probability at site A
real<lower=0, upper=1> phiB; // Survival probability at site B
real<lower=0, upper=1> phiC; // Survival probability at site C
real<lower=0, upper=1> pA; // Recapture probability at site A
real<lower=0, upper=1> pB; // Recapture probability at site B
real<lower=0, upper=1> pC; // Recapture probability at site C
vector[2] lpsiA; // Logit of movement probability from site A
vector[2] lpsiB; // Logit of movement probability from site B
vector[2] lpsiC; // Logit of movement probability from site C

// Parameters for covariates


array[3] real<lower=0, upper=1> beta0; // intercept for each state
array[3] real<lower=0, upper=1> beta1; // coefficient for length for each state
array[3] real<lower=0, upper=1> beta2; // coefficient for temperature for each state
}
transformed parameters {
real psiA_stay; // Probability of staying at state A
real psiA_B; // Probability of transitioning from A to B
real psiB_A; // Probability of transitioning from B to A
real psiB_stay; // Probability of staying at state B
real psiC_A; // Probability of transitioning from C to A
real psiC_stay; // Probability of staying at state C to B
real psiAA;
real psiAB;
real psiAC;
real psiBB;
real psiBA;
real psiBC;
real psiCC;
real psiCA;
real psiCB;
simplex[3] psiA; // Movement probability from site A
simplex[3] psiB; // Movement probability from site B
simplex[3] psiC; // Movement probability from site C
array[4, nind, n_occ_minus_1] simplex[4] ps;
array[4, nind, n_occ_minus_1] simplex[4] po;

// Constrain the transitions such that their sum is < 1
psiA = softmax_0(lpsiA);
psiB = softmax_0(lpsiB);
psiC = softmax_0(lpsiC);

// Define state-transition and observation matrices
for (i in 1 : nind) {
// Define probabilities of state S(t+1) given S(t)
for (t in 1 : n_occ_minus_1) {

// Calculate movement probabilities with covariates
psiA_stay = inv_logit(beta0[1] + beta1[1] * X[i] + beta2[1] * Y[t]);
psiA_B = inv_logit(beta0[1] + beta1[1] * X[i] + beta2[1] * Y[t]);
psiB_A = inv_logit(beta0[2] + beta1[2] * X[i] + beta2[2] * Y[t]);
psiB_stay = inv_logit(beta0[2] + beta1[2] * X[i] + beta2[2] * Y[t]);
psiC_A = inv_logit(beta0[3] + beta1[3] * X[i] + beta2[3] * Y[t]);
psiC_stay = inv_logit(beta0[3] + beta1[3] * X[i] + beta2[3] * Y[t]);

psiAA = psiA_stay;
psiAB = (1 - psiA_stay) * psiA_B;
psiAC = (1 - psiA_stay) * (1 – psiA_B);
psiBB = psiB_stay;
psiBA = (1 - psiA_stay) * psiB_A;
psiBC = (1 - psiB_stay) * (1 – psiB_A);
psiCC = psiC_stay;
psiCA = (1 - psiC_stay) * psiC_A;
psiCB = (1 - psiC_stay) * (1 – psiC_A);

  ps[1, i, t, 1] = phiA * psiA[1];
ps[1, i, t, 2] = phiA * psiA[2];
ps[1, i, t, 3] = phiA * psiA[3];
ps[1, i, t, 4] = 1.0 - phiA;
ps[2, i, t, 1] = phiB * psiB[1];
ps[2, i, t, 2] = phiB * psiB[2];
ps[2, i, t, 3] = phiB * psiB[3];
ps[2, i, t, 4] = 1.0 - phiB;
ps[3, i, t, 1] = phiC * psiC[1];
ps[3, i, t, 2] = phiC * psiC[2];
ps[3, i, t, 3] = phiC * psiC[3];
ps[3, i, t, 4] = 1.0 - phiC;
ps[4, i, t, 1] = 0.0;
ps[4, i, t, 2] = 0.0;
ps[4, i, t, 3] = 0.0;
ps[4, i, t, 4] = 1.0;

// Define probabilities of O(t) given S(t)
po[1, i, t, 1] = pA;
po[1, i, t, 2] = 0.0;
po[1, i, t, 3] = 0.0;
po[1, i, t, 4] = 1.0 - pA;
po[2, i, t, 1] = 0.0;
po[2, i, t, 2] = pB;
po[2, i, t, 3] = 0.0;
po[2, i, t, 4] = 1.0 - pB;
po[3, i, t, 1] = 0.0;
po[3, i, t, 2] = 0.0;
po[3, i, t, 3] = pC;
po[3, i, t, 4] = 1.0 - pC;
po[4, i, t, 1] = 0.0;
po[4, i, t, 2] = 0.0;
po[4, i, t, 3] = 0.0;
po[4, i, t, 4] = 1.0;
}


}
}
model {

array[4] real acc;
array[n_occasions] vector[4] gamma;

// Priors

// Survival and recapture: uniform
// Uniform priors are implicitly defined.
// phiA ~ uniform(0, 1);
// phiB ~ uniform(0, 1);
// phiC ~ uniform(0, 1);
// pA ~ uniform(0, 1);
// pB ~ uniform(0, 1);
// pC ~ uniform(0, 1);
// psiA_stay ~ uniform(0, 1);
// psiA_B ~ uniform(0, 1);
// psiB_A ~ uniform(0, 1);
// psiB_stay ~ uniform(0, 1);
// psiC_A ~ uniform(0, 1);
// psiC_stay ~ uniform(0, 1);
// psiAA ~ uniform(0, 1);
// psiAB ~ uniform(0, 1);
// psiAC ~ uniform(0, 1);
// psiBB ~ uniform(0, 1);
// psiBA ~ uniform(0, 1);
// psiBC ~ uniform(0, 1);
// psiCC ~ uniform(0, 1);
// psiCA ~ uniform(0, 1);
// psiCB ~ uniform(0, 1);
// beta0 ~ uniform(0, 1);
// beta1 ~ uniform(0, 1);
// beta2 ~ uniform(0, 1);

// Normal priors on logit of all but one transition probs
lpsiA ~ normal(0, sqrt(1000));
lpsiB ~ normal(0, sqrt(1000));
lpsiC ~ normal(0, sqrt(1000));

// Likelihood
// Forward algorithm derived from Stan Modeling Language
// User’s Guide and Reference Manual
for (i in 1 : nind) {
if (first[i] > 0) {
for (k in 1 : 4) {
gamma[first[i], k] = y[i, first[i]] == k;
}

  for (t in (first[i] + 1) : n_occasions) {
for (k in 1 : 4) {
for (j in 1 : 4) {
acc[j] = gamma[t - 1, j] * ps[j, i, t - 1, k]
* po[k, i, t - 1, y[i, t]];
}
gamma[t, k] = sum(acc);
}
}
target += log(sum(gamma[n_occasions]));
}


}
}