Help structuring custom, discrete-time hazard model

Hi folks,

I could use some help with a complex modeling framework. I have some experience with Stan, but I’m relatively new to survival models—especially in discrete time. I think it’s an interesting problem and would really appreciate input on structuring the model appropriately and making sure it is correctly implemented in Stan.

Overview

I’m trying to model a discrete-time survival process to estimate relative discard mortality in a recreational fishery using tag-recapture data. Over 7,000 fish were tagged and released, and observations occur over 13 discrete time intervals (quarters) and 17 discrete spatial locations. We want to estimate the effect of fish condition at release (fair or poor vs. good [baseline]) on recapture probability (proxy for survival), while accounting for:

  • Spatiotemporal variation in loss rates,
  • Effort-dependent re-encounter probability

We assume condition only affects immediate post-release survival, not long-term vulnerability (i.e., recovered fish are equally catchable, regardless of post-release condition). Based on the species’ biology and recapture data, we assume movement among locations is negligible.

Recaptures are only observable if fish are caught by a specific monitored fleet. If fish are caught by unmonitored fleets (e.g., private anglers), die (natural or post-release mortality), or shed their tags, they are permanently lost from the system. Importantly, we know effort in each location and time step for the monitored fleet.

Although we have limited data on unmonitored fishing activity (a source of loss), for now, we assume each location-time combination has its own latent loss rate that accounts for all unobserved removal processes. We are not necessarily interested in the mechanisms for loss over time, we only want to account for them. We have tens to hundreds of fish at risk in each location-time interval, which should facilitate estimation of the discrete loss rates across time and space.

Tagged fish enter the risk set at different times (i.e., staggered entry). I’ve converted the data to person-period format, where each fish has one row per time period they are at risk. The response variable R = 0 while the fish is still at risk; at the terminal time step, R = 1 if recaptured or R = 0 if censored.

Model structure

I currently express the model as:

Pr(R_{i} \mid \text{not yet recaptured}) = \underbrace{\psi_i}_{\text{survived post-release}} \cdot \underbrace{\prod^{t-1}_{s=t_{0,i}} \lambda_{i[l],s}}_{\text{still at risk at time }t} \cdot \underbrace{(1- \exp(-q \cdot E_{i[l,t]}))}_{\text{re-encountered at time } t}

Survival post-release for fish i depends on condition (dummy syntax) X_i with associated coefficients \beta modeled using a standard logit link.

\psi_i= \frac{\exp(X_i^\top \beta)}{1+\exp(X_i^\top \beta)}

We do not know if a fish died or lost it’s tag after release unless it was recaptured. Therefore, remaining at risk until time t is modeled as the product of retention probabilities up to t

\prod^{t-1}_{s=t_{0,i}} \lambda_{i[l],s} = \exp\bigg(\sum^{t-1}_{s=t_{0,i}} \ln (\lambda_{i[l,s]})\bigg)

Each \lambda_{l,t} is a location-time-specific probability of remaining at risk (i.e., not being lost due to unobserved capture, tag shedding, or natural mortality), on the logit scale:

\text{logit}(\lambda_{l,t}) \sim N(\mu, \sigma_\mu)

This is essentially modeled as a random effect varying across time and space. It is intentionally left simple for now. I realize this can be improved, and I am exploring more reasonable spatiotemporal random effects formulations such as an ST-CAR framework if I can get the sampler to run efficiently.

Finally, re-encounter probability at t is a function of local effort E_{l,t}:

1- \exp(-q \cdot E_{i[l],t}).

where q is a global catchability parameter and constrained to be positive so that the function stays bounded between 0 and 1 and is 0 when effort at a given location and time interval is 0.

Therefore, my first attempt at a model was this:

\begin{align} R_i \mid \text{not yet recaptured} &\sim Bernoulli(\pi_i) \\ &\pi_i = \psi_i \cdot \exp\bigg(\sum^{t-1}_{s=t_{0,i}} \ln (\lambda_{i[l],s})\bigg) \cdot (1- \exp(-q \cdot E_{i[l,t]}))\\ &\qquad \psi_i= \frac{\exp(X_i^\top \beta)}{1+\exp(X_i^\top \beta)}\\ &\qquad \ln(q) \sim N(-1,1)\\ &\qquad \beta \sim N(0,1)\\ &\qquad \text{logit} (\lambda_{l,t}) \sim N(\mu, \sigma_{\mu})\\ &\qquad \qquad\mu \sim N(-3, 1)\\ &\qquad \qquad \ln (\sigma_\mu) \sim N(0,1) \end{align}

This model compiles and runs in Stan without issue, and diagnostics/fit look ok. However, I have never constructed a framework like this before, and wonder if it could be simplified or otherwise improved. Here are some questions I have, in order of importance:

  1. Does this model structure appropriately reflect a discrete-time survival framework with staggered entry and covariate-dependent post-release survival?
  2. Is my approach to modeling cumulative loss risk via spatiotemporal \lambda_{l,t} reasonable, or are there more efficient (or identifiable) approaches to reduce parameter complexity?
  3. Would a more standard (and simple) framework incorporating post-release survival, cumulative loss, and re-encounter effects into a single linear predictor and related to the response through a complementary log-log link function be a better approach?

I’d really appreciate feedback on the conceptual structure, implementation strategy, or alternatives I should consider.

Thanks in advance. Here is the corresponding stan code for the model:

data {
  int<lower=1> N;                   // number of data rows
  int<lower=1> K;                   // number of post-release predictors (condition)
  int<lower=1> L;                   // number of locations
  int<lower=1> T;                   // number of time steps
  int<lower=1> Time[N];             // time interval (e.g., 1, 2, 3,...)
  int<lower=1> Start[N];            // time of tagging
  vector[N] Effort;                 // effort for individual i in location l at time t
  matrix[N,K] X;                    // condition variables (dummy syntax)
  int<lower=0> R[N];                // Recapture (0 or 1)
  int<lower=1,upper=L> loc[N];      // location ID
}

parameters {
  vector[K] beta;                   // logit post-release survival per condition
  real mu;                          // log baseline loss rate
  real log_q;                       // log global catchability
  matrix[L,T] z_mu;                 // random effect (non-centering)
  real<lower = 0> sigma_mu;         // sd of random effects
}

transformed parameters {
  real q = exp(log_q);              // global catchability
  matrix[L,T] u = sigma_mu * z_mu;  // random effects
  vector[N] psi = inv_logit(X*beta);// predictor for post-release survival
  vector[N] cum_log_loss;           // cumulative natural loss for each observation
  for (i in 1:N) {
    real sum_log_loss = 0;
    int l = loc[i];
    for (s in Start[i]:Time[i]) {
      sum_log_loss += log_inv_logit(mu + u[l, s]); // note that lambda = inv_logit(mu + u[l, s])
    }
    cum_log_loss[i] = sum_log_loss;
  }
}

model {
  beta ~ normal(0,1);
  mu ~ normal(-3,1);
  log_q ~ normal(-1,1);
  to_vector(z_mu) ~ normal(0, 1);
  sigma_mu ~ exponential(10);

  for (i in 1:N) {
    R[i] ~ bernoulli(psi[i]*exp(cum_log_loss[i])*(1-exp(-q*Effort[i])));
  }
}

generated quantities{
  vector[N] R_sim;
  for (i in 1:N) {
    R_sim[i] = bernoulli_rng(psi[i]*exp(cum_log_loss[i])*(1-exp(-q*Effort[i])));
  } 
}
3 Likes

Hi @Challen_Hyman,

First off, I’d just like to complement the quality of your question. It’s a lot of effort to describe a problem like this in a way that others can understand (and become interested!).

The problem you’re describing is similar in many ways to the kind of work my colleagues and I have been doing with release-recapture data from migratory fish. So I’m approaching your question from that point-of-view. I’m aware that this means I might have my own set of biases I’m bringing to this question which admittedly differs in important ways from traditional mark-recapture.

I think that the answer to this question is probably no because it seems to me like the thing you’re missing is a term for the probability of not being recaptured during a time-period given that an individual is still at risk. If I understand your problem description, it sounds like for each individual you have information on the time-period of first entry (t_{0,i}), whether or not the individual was recaptured, R_i, and given that an individual was recaptured the time period of recapture (t). It seems like you’re trying to wrap up the probability of not being recapture into the \lambda terms, but I don’t think that will work, particularly because it seems like an individual who not recaptured after release will have a term for the catchability parameter of the last capture event in their likelihood.

I think you’re going to have to go with a CJS-type likelihood where the catchability enters into the likelihood for each time-step. Some questions for that: Is this a terminal fishery or is there catch and release? Are individuals recaptured at most once? If so, it’s possible you’re going to have trouble getting reasonably precises estimates without either informative priors on catchability or auxiliary data (probably not in the budget to acoustic tag some individuals so you could get an independent estimate of the number of fish at-risk of recapture that were not, huh?). That said, if the fish are relatively well-distributed among the staggered entries and if the data set isn’t dominated by fish released and never seen again, I think also possible that you’d could probably get resolution on the relevant parameters even in the circumstance where recaptures are fatal.

I think it’s reasonable particularly in a CJS framework where the logit link is quite common. That said there’s I think there’s a strong argument to be made for using a clog-log and it’s increasingly becoming my preferred approach.

The more standard framework would be a CJS and for that you’d maintain the separate parameters for survival and re-encounter probabilities. Depending on the sparsity of recaptures and relative balance among entry times/locations the relevant parameters may or may not be decently estimable. For this problem, I do like the link function for the catchability parameter given that it is zero when effort is zero, so you might as well keep that. A logit-link for post-release survival also seems reasonable. The cumulative loss terms might be better off a log-log link, but logit isn’t a deal-breaker.

Thanks again @Dalton—I really appreciate this thoughtful reply, and apologies for the delayed response. I hope what follows clarifies my underlying logic.

If I understand your problem description, it sounds like for each individual you have information on the time-period of first entry (t_{0,i}), whether or not the individual was recaptured, R_i, and given that an individual was recaptured the time period of recapture (t).

Yes, that’s consistent with my case, with the added detail that I also include environmental and fishing conditions surrounding the fish at the time of initial capture and release to estimate \psi_i.


Catchability and loss modeling

It seems like you’re trying to wrap up the probability of not being recapture into the λ terms, but I don’t think that will work, particularly because it seems like an individual who not recaptured after release will have a term for the catchability parameter of the last capture event in their likelihood.

Could you elaborate on this? If you mean that different fish in the same location and time interval should have different catchabilities (e.g., due to size), then I completely agree. To address this, I’ve re-parameterized the model so that the probability of loss is modeled directly from three sources:

  1. A time-invariant natural mortality/tag shedding rate \gamma_0,
  2. (Unmonitored) private recreational fishing effort at time t, denoted \text{Rec}_t, and
  3. Size-selectivity s(L_i), based on the distribution of hook sizes used and previous gear selectivity studies.

The probability of loss (i.e., not surviving) for fish i at time t, given survival to t-1, is now:

\lambda_{it} = 1 - \exp\!\left( -\exp\!\big(\gamma_0 + \gamma_1 \,\text{Rec}_{l[i]t} + \log s(L_i)\big) \right).

This uses the cloglog link you recommended (thank you!).


Cumulative retention

Cumulative retention from release to time t is:

\rho(l[i]t) = \prod_{s=t_0}^{t} \big(1 - \lambda_{l[i]s}\big) = \exp\!\left( \sum_{s=t_0}^{t} -\exp\!\big(\gamma_0 + \gamma_1 \,\text{Rec}_{l[i]s} + \log s(L_i)\big) \right)

Recapture probability

Conditioned on survival up to t, the probability of re-encounter at time (t) is modeled as a function of monitored fishing effort (\text{Mon}) with the same selectivity term:

\eta_{it} = 1 - \exp\!\left( -\exp\!\big(\alpha_0 + \alpha_1 \,\text{Mon}_{l[i]t} + \log s(L_i)\big) \right)

Here, I assume the same selectivity curve applies to both monitored and unmonitored fleets, which seems reasonable given what we know about them.


Final model

The final likelihood for recapture at time t, conditional not being recaptured before t, is:

\Pr(R_{it} \mid \text{not yet recaptured}) \;=\; \underbrace{\psi_i}_{\text{post-release survival}}\\ \times \underbrace{\exp\!\left( \sum_{s=t_{0i}}^{t} -\exp\!\big(\gamma_0 + \gamma_1 \,\text{Rec}_{l[i]s} + \log s(L_i)\big) \right)}_{\text{Cumulative retention}} \\ \times \underbrace{\Bigg( 1 - \exp\Big(-\exp(\alpha_0 + \alpha_1 \,\text{Mon}_{l[i]t} + \log s(L_i))\Big) \Bigg)}_{\text{Re-encounter}}

where \psi_i is modeled as a fucntion of covariates at release X_i and associated coefficients \beta.


Outstanding questions

  1. Do these modifications—incorporating catchability/size-selectivity at each time step—address the catchability concerns you raised?
  2. Can a CJS framework still be applied here? I know CJS assumes no tag loss and instantaneous sampling, but since I’m using discrete time, are these assumptions relaxed enough?
  3. With very low recapture rates (probability of recapture by the monitored fleet is small), is survival still estimable under CJS? Or would informative priors/auxiliary data be essential?

Additional clarifications

Some questions for that: Is this a terminal fishery or is there catch and release?

Yes, this is a catch-and-release fishery, but for estimation purposes I treat individuals captured by the unmonitored fleet as permanently lost. Happy to elaborate here.

Are individuals recaptured at most once?

Individuals are typically recaptured at most once (though a handful are recaptured up to three times).

If so, it’s possible you’re going to have trouble getting reasonably precises estimates without either informative priors on catchability or auxiliary data (probably not in the budget to acoustic tag some individuals so you could get an independent estimate of the number of fish at-risk of recapture that were not, huh?).

Unformtunately, auxiliary data (e.g., acoustic telemetry) isn’t currently in the budget, though we do plan on using some high-reward tagging with reference individuals in the future, depending on funding.

Really appreciate your help on this. Let me know if you need additional clarifications!

Hey Challen,

My response was kind of rushed, but I’m glad that is was at least somewhat helpful to you. If you’d like to maybe try to jump on a call and discuss, I could find some time, but I’m also happy to continue exchanging on the forum especially since the dialogue may be helpful for others down the line.

What I meant by this (based on my interpretation of your original model) is that for individuals with R_i = 0, you would have
Pr(R_i = 0) = 1 - (\pi_i * \mathrm{exp}(\sum_{s = t_{0,i}}^{T-1}\mathrm{ln}(\lambda_{i[l],s}) * (1 - \mathrm{exp}(-q * E_{i[l,t]}))

which is basically saying the probability of not recapturing a fish is 1 minus the probability it survived to and was captured at the final occasion. So the probability of an individual remaining alive, but uncaptured is in your model, but it only applies to the final capture occasion. And I think that there-in lies a potential pitfall. Ignore \lambda for the moment (or imagine it is one), then in your likelihood the term for the probability of recapture at time t, p_{i,t} = (1 - \mathrm{exp}(-q * E_{i[l,t]})), would only have successes recorded for occasions t < T but a lot of failures for t = T. Maybe I read your model wrong, but my gut feeling was that the math laid out here would not capture the data generating process you’re envisioning.

Based on your further responses, particularly the fact that some individuals may be capture more than once, even if it is very few (which you would expect if capture probability is quite low), I really do think that a CJS-type likelihood is appropriate here. CJS is already a discrete time model (except in migratory species like the ones I work with where we take advantage of the unidirectional nature of migration and use discrete sampling in space). Tag loss can be accommodated with appropriate terms (especially if you have auxiliary data like a subset of double-tagged individuals). The instantaneous sampling assumption is almost always fudged in practice, so I wouldn’t worry about it too much, at least not as you’re developing your model. As long as individuals aren’t recaptured more than once during a sampling occasion and you’re okay with assuming that individuals have the same probability of survival (conditional on covariates) whether they were captured at the beginning or end of the sampling occasion (or not captured), then you’re fine.

You might have trouble with getting reliable estimates of survival when recapture rates are very low, but that’s nature of fish. You wind up with a lot of individuals who are never seen again after release and those terms (which are the probability of either dying or surviving and remaining uncaptured) tend to dominate the likelihood. Informative priors seem like a reasonable thing to use here to help mitigate that - though you might want to do some prior predictive checks to help you tune them given that you’ve got some transformations.

A CJS-type model isn’t very far off from what you have. It’d be something like:

Pr(R_{i,s,t} = 1) = \psi_i \times \phi_{i,s} \times \prod_{v=s + 1}^{t-1} \left(\phi_{i,v} (1 - p_{i, v}) \right) \times p_{i,t} where s is the occasion of an individuals capture (or for the few individuals recaptured more than once, there last recapture) and t is the occasion of an individuals next recapture, \phi is equivalent to your 1 - \lambda, and p is your \eta.

Then for individuals not recaptured after capture at occasion s you would just have Pr(R_{i,s} = 0) = 1 - \psi_i \times \sum_{u = s}^T \left( \prod_{v=s + 1}^{u-1} \left(\phi_{i,v} (1 - p_{i, v}) \right) \times p_{i,u} \right). My indexing might be off a bit with the \phi terms, but basically it’s one minus the probability you capture the individual at any of the subsequent capture occasions.

P.S. Might want to add the ecology tag to this thread.

Thanks, @Dalton — I really appreciate that clarification. I’m following you now.

I also realized there was an error in my original post: I forgot to include a subscript for t. The data are in a person (well, fish)–period format, where I have an R_{it} for each fish. If a fish is not captured at any timestep, the model evaluates R_{it} = 0 for all time steps from t_{0,i} to T. This (hopefully) makes sure the model accounts for recaptures across all occasions when each fish is still at risk.

For example, going back to your case: if a fish is released at t = 1 and recaptured at T = 2, the first entry for that fish would be R_{i,1} = 0 and the second would be R_{i,2} = 1.

I agree, the CJS likelihood you wrote down looks a lot like my current likelihood. I’m working with hazards rather than discrete-time probabilities directly, but the overall structure feels very similar. I think the main difference is that in your formulation, the loss term is applied in the current time step, which makes more sense biologically than what I was doing.

Thanks again for taking the time to go through this in detail. I’ll also see if I can figure out how to edit my post to add an ecology tag.

Thanks for that clarification. However, I’m still not sure that this is going to give you what you want (that or I’m completely misunderstanding your model). You have a Bernoulli likelihood for each capture occasion for each individual and the probability that you laid out can be simplified to terms accounting for the survival probability from capture occasion t-1 to t and recapture probability for time period t. The survival probability enter into the terms as cumulative probabilities while the recapture probability are singletons.

I’m going to simplify your likelihood as laid out under the Final Model heading above, I’ll use \phi and p for these respectively following the mark-recapture conventions and ignore the post-release survival.
It seems to me you you have:

Pr(R_{i,t} = 1) = \prod_{u = t_0}^t \phi_u \times p_t
Pr(R_{i,t} = 0) = 1 - \prod_{u = t_0}^t \phi_u \times p_t

But there are two related problems with this: first, is that you don’t really have a way of accounting for multiple missed re-encounters for surviving fish and, second, is that the model is saying whenever R_{i,t} = 0 it’s because an individual either died sometime before time t or was not recaptured at time t.

So in this example you would have:

Pr(R_{i,1} = 0) = 1 - \phi_1p_1
Pr(R_{i,2} = 1) = \phi_1 \phi_2 p_2

But by virtue of the individual being recaptured at T= 2, you know that the individual was alive at t = 1 so Pr(R_{i,1} = 0 | R_{i,2} = 1) = \phi_1(1- p_1). You also wind up with a duplicate term for \phi_1 because the model doesn’t appear to be conditional on period 1 already being accounted for, you should have something like Pr(R_{i,2} = 1 | R_{i,1} = 0) = \phi_2 p_2.

In a CJS formulation you’d have this likelihood for the example capture history after the last term: \phi_1(1- p_1)\phi_2 p_2. If I understand your model correctly, you wind up with: (1 - \phi_1p_1)\times \phi_1 \phi_2 p_2.

Basically the issue is that you’re not accounting for the cumulative probability of not being re-encountered for fish that remain alive. 1-p needs to enter the likelihood.

The CJS model has more connections to traditional survival modeling with hazards than I think is widely taught or acknowledged. It’s basically just a survival model with a fancier censoring process (and accounting for that censoring process is what i think you’re missing). You can (and I think should) express the model entirely in terms of cumulative hazards.

2 Likes

Thanks Dalton,

I appreciate you sticking with me on this. I think I’m starting to see what you mean. Let me know if I’ve got the framework right. First, each fish has only one entry in the model with a column R denoting the timestep of recapture (R_i=t) or censoring (0). Second, if the fish i was recaptured at time t, the likelihood contribution is:

\begin{aligned} Pr(R_i = t) &= \psi_i \times \phi_{l[i],t} \times \prod_{s=t_{0,i}}^{t-1}\Big(\phi_{l[i],s}(1-p_{l[i],s})\Big) \times p_{l[i],t}\\ &p_{l[i],t} = \underbrace{1-\exp\bigg(-\exp\Big(\alpha_0 + \alpha_1\text{Mon}_{l[i],t} + \log S(L_i)\Big)\bigg)}_{\text{Probability of re-encounter}}\\ &\phi_{l[i],t} = \underbrace{1-\exp\bigg(-\exp\Big(\gamma_0 + \gamma_1 \text{Rec}_{l[i],t} + \log S(L_i)\Big)\bigg)}_{\text{Probability of survival}} \end{aligned}

What I’m still a little unsure about is how to write down the censoring likelihood. Here’s my current attempt (assuming survival occurs before re-encounter):

\begin{aligned} Pr(R_i = 0) &= 1-\sum_{t = t_{0,i}}^T Pr(R_i = t)\\ &= \underbrace{1-\psi_i}_{\text{Dies immediately}} \\ &+ \underbrace{\psi_i\times \sum_{t= t_{0,i}}^T\Bigg[\bigg(\prod_{s=t_{0,i}}^{t-1}\phi_{l[i],s}(1-p_{l[i],s})\bigg)\Big(1-\phi_{l[i],t}\Big)\Bigg]}_{\text{Survived to t-1 and was never re-encountered but died at time t}}\\ &+ \underbrace{\psi_i\times \Bigg(\prod_{s=t_{0,i}}^{T}\Big(\phi_{l[i],s}(1-p_{l[i],s})\Big)\Bigg)}_{\text{Survived to t but was never re-encountered}} \end{aligned}

Does that look like a correct way of structuring it?

If you’ve got fish recaptured more than once, I really do recommend including a process for that in this model. That’s going to do a lot to inform your capture probability estimates. Even if you don’t have multiple recaptures but you’re willing to assume that the process of recapture has no impact on survival and that fish released after capture could theoretically be captured again, you really should include a component for this process.

The way mark recapture models work is that you get all of your information on mortality from individuals that aren’t re-encountered again after their last observations. While all of your information on capture probability comes from individuals that are re-encountered at least once.

So for each individual you should have at least two pieces of information, the occasion of their initial capture (call this x_i) and the occasion of their last capture (call this z_i). If z_i \ne x_i, then that means an individual is recaptured at least once. For those individuals you might go ahead and set R_i = 1 and add in a a contribution to the likelihood that is something like:

Pr(R_i = 1) = \psi_i \times \prod_{u=s_i + 1}^{t_i - 1} \left( \phi_{i,u_i} (1 - p_{i,u_i}) \right)\phi_{i, t_i}p_{i,t_i}

The way a CJS model typically works is that data for each individual is written down as a capture-history: y_{i,t}, which consists of a Boolean indicator of whether individual i was captured at time t. This would a size T integer array. It’s convenient to define x_i as the index of individuals first capture and z_i as the index of an individuals last capture with x_i, z_i \in (1, ..., T). In your case it sounds like these arrays are pretty sparse, such that for most individuals you have either a single capture (x_i = z_i) or only 1 recapture. So ignoring those multiple recaptures for the moment (and so also ignoring the whole idea of a capture history) your likelihood could be written as:

Pr(z_{i,k}| x_{i,k}) = \left[\psi_i\prod_{k = x_i + 1}^{z_i - 1}\left(\phi_{i,k}(1 - p_{i,k}) \right) \phi_{i,z_i} p_{i,z_i} \right] \chi_{i,z_i}

The terms in the bracket are basically what you have for Pr(R_i = 1), but it sounds like you’re leaving off the \chi_{i,z_i} terms in this case (which you would only do in traditional mark-recapture if you had what’s called “loss on recapture”, e.g. something like lethal sampling). I think you probably want to include it because the recapture individuals also provide information on mortality by virtue of being released and then censored until the end of the study.

But what about \chi_{i,z_i}? This is what you’re getting at with your Pr(R_i = 0) term you laid out above. The term \chi_{i,z_i} is the probability of never being observed again after recapture which can be either because an individual died at some point after last capture or because the individual remains alive, but uncaptured. In most of the literature this term is usually defined and expressed recursively both in theory and in computation. Setting \chi_{i,T} = 1, the preceding terms through \chi_{i,1} are usually written (see the Users Guide for example) as:

\chi_{i,t} = (1 - \phi_{i,t + 1}) + \phi_{i,t + 1}(1-p_{i,t + 1})\chi_{i,t + 1}

This sets up a recursion which can be a little annoying to write out in Stan. The intuition behind the recursion is that for each capture occasion we have to account for two possibilities: (1) the individual did not survive to capture occasion t + 1 with probability 1 - \phi_{t + 1} or (2) the individual did survive to the next capture occasion with probability \phi_{t + 1} but was not captured (with probability 1 - p_{t + 1}) and was also not captured again after being alive at capture occasion t + 1 probability \chi_{t + 1}.

But if you do a little algebra, you get something a little closer to how your wrote the Pr(R_i = 0) terms.

\chi_{i,t} = 1 - \sum_{u = t + 1}^{T}\left[\left( \prod_{s = t + 1}^u \phi_{i,s}(1-p_{i,s })\right)\frac{p_{i,u}}{1-p_{i,u}} \right]

All this is is saying is that probability of not seeing an individual again after time t is the 1 minus the sum of the mutually-exclusive probabilities of seeing them at least one more time. You don’t need to keep track of any (1 - \phi) terms. In fact within the summand you pretty much calculate the same general structure as you do for recaptured individuals.

So including your \psi_i parameter you would calculate:

\chi_{i,t} = 1 - \psi_i\times\sum_{u = t + 1}^{T}\left[\left( \prod_{s = t + 1}^u \phi_{i,s}(1-p_{i,s })\right)\frac{p_{i,u}}{1-p_{i,u}} \right].


To put this a bit more concretely, suppose you have T = 4 capture occasions.

For a fish released at t = 1 and recaptured at t = 3, the contribution to the likelihood is:

\psi_i \times \phi_{i,2}(1-p_{i,2})\phi_{i,3}p_{i,3} \times \chi_{i,3}
here, \chi_{i,3} = (1 - \phi_{i,4}p_{i,4})

For a fish released at t = 1 but never recaptured the contribution to the likelihood is:

\chi_{i,1} = 1 - \psi_i\left(\phi_{i,2}p_{i,2} + \phi_{i,2}(1-p_{i,2})\phi_{i,3}p_{i,3} + \phi_{i,2}(1-p_{i,2})\phi_{i,3}(1 - p_{i,3})\phi_{i,4}p_{i,4} \right)


Finally, if you have at least a few individuals with more than one recapture, you can take a more traditional capture history perspective where you’re including those recaptures that happen in between the initial and final capture. That’d be something like:

Pr(y_{i,k}) = \left( \prod_{k = x_i + 1}^{z_i - 1}\left[\phi_{i,k}(1 - p_{i,k}) \left( \frac{p_{i,k}}{1-p_{i,k}}\right)^{y_{i,k}}\right] \phi_{i,z_i}p_{i,z_i} \right)\chi_{i,z_i}

Note that y_{i, k} is a Boolean indicator, so that the \frac{p_{i,k}}{1-p_{i,k}} is only trigger on recaptures. This is my preferred way to write the CJS likelihood because it makes it pretty convenient to use a logit-link for p and a log-log for \phi.

2 Likes

I have the feeling here that the notation risks obscuring the intuition here, so I’ll just chime in quickly in case it’s helpful.

Each fish makes an (independent) contribution to the likelihood that we can decompose into two parts. Part 1 deals with the capture history up and until the final recapture; Part 2 deals with the capture history following the final recapture. Note that either of these parts may be of length zero, in which case they don’t contribute to the likelihood.

When dealing with the likelihood for Part 1, we can use the fact that we know that the individual survived through the end of the period (because it was observed alive at the end of the period). If part 1 is of length N, the likelihood for part 1 is the the product of the first N survival probabilities, times the Bernoulli likelihood of the first N elements of the recapture history, conditional on the fitted capture probabilities.

When dealing with the likelihood for Part 2 we note that, having already dealt with the likelihood for Part 1, what we need is the Part 2 likelihood conditional on survival through the end of Part 1, which we know is true, and which we have already incorporated into the likelihood for Part 1. If the Part 2 capture history (which is all zeros) is of length M, then we need to consider M+1 mutually exclusive possibilities for this likelihood (which we need to sum up, since they’re exclusive). The M+1 possibilities are that the fish died immediately prior to each timestep, as well as the possibility that it never dies. Each of these possibilities is the product of the probability of survival through the prior timestep, times the probability of non-detection over all of those timesteps, times the probability of death immediately prior to the timestep.

2 Likes

Thank you both for putting the time into explaining this, this recursive depiction was really helpful. It took a moment to work through the algebra, but this makes sense to me and feels intuitive. I appreciate both the mathematical and intuitive explanations.