Hi folks,
I could use some help with a complex modeling framework. I have some experience with Stan, but I’m relatively new to survival models—especially in discrete time. I think it’s an interesting problem and would really appreciate input on structuring the model appropriately and making sure it is correctly implemented in Stan.
Overview
I’m trying to model a discrete-time survival process to estimate relative discard mortality in a recreational fishery using tag-recapture data. Over 7,000 fish were tagged and released, and observations occur over 13 discrete time intervals (quarters) and 17 discrete spatial locations. We want to estimate the effect of fish condition at release (fair or poor vs. good [baseline]) on recapture probability (proxy for survival), while accounting for:
- Spatiotemporal variation in loss rates,
- Effort-dependent re-encounter probability
We assume condition only affects immediate post-release survival, not long-term vulnerability (i.e., recovered fish are equally catchable, regardless of post-release condition). Based on the species’ biology and recapture data, we assume movement among locations is negligible.
Recaptures are only observable if fish are caught by a specific monitored fleet. If fish are caught by unmonitored fleets (e.g., private anglers), die (natural or post-release mortality), or shed their tags, they are permanently lost from the system. Importantly, we know effort in each location and time step for the monitored fleet.
Although we have limited data on unmonitored fishing activity (a source of loss), for now, we assume each location-time combination has its own latent loss rate that accounts for all unobserved removal processes. We are not necessarily interested in the mechanisms for loss over time, we only want to account for them. We have tens to hundreds of fish at risk in each location-time interval, which should facilitate estimation of the discrete loss rates across time and space.
Tagged fish enter the risk set at different times (i.e., staggered entry). I’ve converted the data to person-period format, where each fish has one row per time period they are at risk. The response variable R = 0 while the fish is still at risk; at the terminal time step, R = 1 if recaptured or R = 0 if censored.
Model structure
I currently express the model as:
Pr(R_{i} \mid \text{not yet recaptured}) = \underbrace{\psi_i}_{\text{survived post-release}} \cdot \underbrace{\prod^{t-1}_{s=t_{0,i}} \lambda_{i[l],s}}_{\text{still at risk at time }t} \cdot \underbrace{(1- \exp(-q \cdot E_{i[l,t]}))}_{\text{re-encountered at time } t}
Survival post-release for fish i depends on condition (dummy syntax) X_i with associated coefficients \beta modeled using a standard logit link.
\psi_i= \frac{\exp(X_i^\top \beta)}{1+\exp(X_i^\top \beta)}
We do not know if a fish died or lost it’s tag after release unless it was recaptured. Therefore, remaining at risk until time t is modeled as the product of retention probabilities up to t
\prod^{t-1}_{s=t_{0,i}} \lambda_{i[l],s} = \exp\bigg(\sum^{t-1}_{s=t_{0,i}} \ln (\lambda_{i[l,s]})\bigg)
Each \lambda_{l,t} is a location-time-specific probability of remaining at risk (i.e., not being lost due to unobserved capture, tag shedding, or natural mortality), on the logit scale:
\text{logit}(\lambda_{l,t}) \sim N(\mu, \sigma_\mu)
This is essentially modeled as a random effect varying across time and space. It is intentionally left simple for now. I realize this can be improved, and I am exploring more reasonable spatiotemporal random effects formulations such as an ST-CAR framework if I can get the sampler to run efficiently.
Finally, re-encounter probability at t is a function of local effort E_{l,t}:
1- \exp(-q \cdot E_{i[l],t}).
where q is a global catchability parameter and constrained to be positive so that the function stays bounded between 0 and 1 and is 0 when effort at a given location and time interval is 0.
Therefore, my first attempt at a model was this:
This model compiles and runs in Stan without issue, and diagnostics/fit look ok. However, I have never constructed a framework like this before, and wonder if it could be simplified or otherwise improved. Here are some questions I have, in order of importance:
- Does this model structure appropriately reflect a discrete-time survival framework with staggered entry and covariate-dependent post-release survival?
- Is my approach to modeling cumulative loss risk via spatiotemporal \lambda_{l,t} reasonable, or are there more efficient (or identifiable) approaches to reduce parameter complexity?
- Would a more standard (and simple) framework incorporating post-release survival, cumulative loss, and re-encounter effects into a single linear predictor and related to the response through a complementary log-log link function be a better approach?
I’d really appreciate feedback on the conceptual structure, implementation strategy, or alternatives I should consider.
Thanks in advance. Here is the corresponding stan code for the model:
data {
int<lower=1> N; // number of data rows
int<lower=1> K; // number of post-release predictors (condition)
int<lower=1> L; // number of locations
int<lower=1> T; // number of time steps
int<lower=1> Time[N]; // time interval (e.g., 1, 2, 3,...)
int<lower=1> Start[N]; // time of tagging
vector[N] Effort; // effort for individual i in location l at time t
matrix[N,K] X; // condition variables (dummy syntax)
int<lower=0> R[N]; // Recapture (0 or 1)
int<lower=1,upper=L> loc[N]; // location ID
}
parameters {
vector[K] beta; // logit post-release survival per condition
real mu; // log baseline loss rate
real log_q; // log global catchability
matrix[L,T] z_mu; // random effect (non-centering)
real<lower = 0> sigma_mu; // sd of random effects
}
transformed parameters {
real q = exp(log_q); // global catchability
matrix[L,T] u = sigma_mu * z_mu; // random effects
vector[N] psi = inv_logit(X*beta);// predictor for post-release survival
vector[N] cum_log_loss; // cumulative natural loss for each observation
for (i in 1:N) {
real sum_log_loss = 0;
int l = loc[i];
for (s in Start[i]:Time[i]) {
sum_log_loss += log_inv_logit(mu + u[l, s]); // note that lambda = inv_logit(mu + u[l, s])
}
cum_log_loss[i] = sum_log_loss;
}
}
model {
beta ~ normal(0,1);
mu ~ normal(-3,1);
log_q ~ normal(-1,1);
to_vector(z_mu) ~ normal(0, 1);
sigma_mu ~ exponential(10);
for (i in 1:N) {
R[i] ~ bernoulli(psi[i]*exp(cum_log_loss[i])*(1-exp(-q*Effort[i])));
}
}
generated quantities{
vector[N] R_sim;
for (i in 1:N) {
R_sim[i] = bernoulli_rng(psi[i]*exp(cum_log_loss[i])*(1-exp(-q*Effort[i])));
}
}