Capture-Recapture Prior & Posterior Predictive Checks?

Hiya folks. Does anyone have a good workflow/any examples of doing prior and posterior predictive checks for capture-recapture models?

I’m trying to improve my larger workflow for Bayesian analysis (a la Gelman et al. 2020) and am considering how to apply this to some of my capture-recapture analyses. I’ve performed prior and posterior predictive checks in contexts where the predicted quantity is some real or integer value, and it makes sense that I can then make inference about my prior selection or the general performance of my model by examining the distribution of these predictions. However, I’m less certain of how I would implement these checks with capture-recapture models where the predicted values would be capture histories. I suppose I could examine, for each individual, the categorical distribution of predicted capture history frequencies. I guess I’m not entirely sure how I would make inference from such a figure or table. It’s very likely I’m missing something obvious, so I was wondering if others could share their approach to doing prior & posterior predictive checks with these sorts of models.

If you want to do model selection or comparison you could generate the individual-level log_lik in the generated quantities and use the loo package. For more general predictive checks, there’s different stats you could compute as you mentioned. For instance you could compute the number of individuals detected per survey, or the total number of detections per individual across the survey period.

1 Like

This is a helpful start! I guess my motivation is less so model comparison (though I have been taking the approach that you recommend for model comparison down the line). Prior to model comparison, I wanna make sure that I have a set of candidate models that I feel good about. In other applications, checking to make sure that my priors don’t lead to biased predictions, or that my posterior predictive distributions reasonably resemble my data have been ways to build confidence that my models are doing what I think they should be doing.

I can imagine how the computed statistics that you suggest could aid in this goal For example, I’d expect that given sufficiently uninformative priors, there would be a lot of variation in any given year in the prior predicted number of individuals detected. I’d also expect that the actual number of individuals detected in a given year should generally fall nicely within that year’s posterior predictive distribution of number of detections. I’ll give this a try.

Nice. Have you seen the new priorsense package? That one’s good for testing prior sensitivity. Also, are you developing new flavours or models? If so, you could use the SBC package to see how well things are going with parameter recovery.

Hi @jacollings.

There are other quantities that can be helpful for developing predictive checks besides the capture-histories themselves. It helps to have a specific model assumption that you’re targeting. For example, suppose you’re considering a Jolly-Seber model (so individuals are either in the state ‘alive’ or ‘dead’), but you’re concerned that there was some temporary emigration from the study site (thus violating the assumption that all individuals alive at a sampling occasion are available for detection). Then one quantity you could look at is the length of intervals between captures. Temporary emigration tends to produce more longer intervals, e.g. capture histories like 010001. You could look at the frequency distribution of such intervals and compare with the data for a posterior predictive check, or perhaps there are some really long intervals (e.g. >4 occasions) in the data that you’re suspicious about, and you could compare the number of such intervals between each posterior replicate dataset and the real dataset.

There are some examples of posterior predictive checks for capture-recapture in this paper of ours. Section 3.1 of the SI has details and the code’s on github. (No prior predictive checks though, which is something we should have done in hindsight).

Hope this helps and offers some inspiration.


The above ideas are certainly a good starting point, but I’ll pitch in my 2 cents as well.

I typically do some prior predictive checks with the detection/encounter probability model, especially if you many covariates and/or group-level (“random”) effects. For a binomial variable, I like to be sure that the priors are not putting a bunch of predictive weight near the boundaries at 0/1. That workflow borrows heavily from what I’d do for a vanilla binomial or Bernoulli regression.

The “abundance” part of the model is a bit trickier. I often like to do some experiments with simulated data to make sure that the model “works”, but that always leaves open the possibility that the data we have are pathological in some way and our inference is poor.

I’d second the summary statistic approach to prior/posterior predictive checks. The exact statistics you’d want to calculate would likely depend upon the particular flavor of capture-recapture model you have and your study system. My thought being that the assumptions of the different models may be more (or less) prone to violation in different contexts and you’d want to probe the areas where you most suspected mismatches.

The temporary emigration check @mbc proposes is a good one. I could imagine the converse could work for temporary immigration, assuming that the organisms are long-lived relative to the sampling and capture probability is relatively high. A more general check might be to examine the frequency distribution of the number of captures for each individual – seems like @mbc already thought of that, although it’s a little different given the context of a multi-state model.

@Dalton might have some additional thoughts to offer?

1 Like

Thanks for tagging me on this @jgoldberg! And I think that @mbc hits the nail on the head when they say that the choice of prior/posterior predictive check really depends on the features of the ecological system that you’re most interested in capturing.

I’ve got two examples from my own work. In this paper, we included the workflow for doing both prior and posterior predictive checks in the supplementary materials.

We used multiple quantities, such as, the number never captured again after initial release, the number captured on each temporal strata at each monitoring location, the total number captured.

For prior predictive checks we were concerned with trying to make sure that the priors produced a broad but reasonable range of values. In particular one thing we were concerned with was making sure that the simulated fish were somewhat evenly distributed across the temporal strata. The concern here was that because the monitoring period was limited, we had a situation somewhat again to logit or similar transform where a too diffuse prior would result in strange U-shaped peak on the tails. And because all of the priors pieces were moving together we wanted to make sure that we didn’t accidentally produce a strong prior on the product of multiple parameters that went unexamined. (I think this is somewhat unavoidable though. Consider that the distribution of the product of two Uniform(0,1) random variables. It’s not Uniform.)

We just had a paper come out last month (author’s link) where we used posterior predictive checks with a bit of prolepsis in mind. We suspected that there might be some concerns among some readers about whether it was appropriate to pool hatchery raised and wild fish and assume that they had similar capture probabilities. So to address that we decided to separate out the two groups of fish in the posterior predictive checks (even though they were modelled pooled in many respects). The reasoning here being that if one group of fish had posterior predictive intervals that systematically under- or over-predicted, then that might give us a reason to go back in and add a term to the model to account for the difference due to origin.

We also observed something cool (well, I think it’s cool) where it looked like a higher level summary statistic (the sum of detected PIT-tagged individuals in a release group over all temporal strata) was poorly replicated (in the sense that less than 90% of the posterior predictive intervals contained the observed value). On the other hand PPIs for our more granular summary statistic had much better coverage. We think the explanation for this has to do with the precision of the estimates at the more granular level (which comes from choices in our parameterization). The consequence of high-precision was that if the PPI for one stratum for one release group was missed, then it was very likely the sum over all the strata for that release group would also miss. Once we realized that, we decided to not worry about it that summary statistic performing less well, especially because the data for the PIT-tagged fish were in some ways only ancillary to our objective of estimating abundance.

So those are some specific examples, but in general I think @jgoldberg and @mbc had it right: you should probably think about what summary statistic(s) is most revealing of whichever features you’re trying to best capture. It’s good to look at several, but they don’t all have to “hit”, especially if they’re less important features for your objective.

That said, for LOOIC, it really is worth thinking about the fact that the unit of replication (the “one” in leave-one-out) is an individual animal - which means that the log-likelihood of an individual’s capture history is an important quantity to consider.

I’ll also drop these slides for you. This is from a talk that was heavily inspired by (and liberally borrowed from) @betanalpha
An Introduction to the Bayesian Workflow for Mark-Recapture.pdf (2.1 MB)
It’s more than a few years old now, but might be of use to you.


Wow, all of these responses have been super helpful and informative! I really appreciate the ideas about particular statistics that address potential concerns about the models. This also serves as my regular and important reminder to check that priors on a bunch of covariate effects aren’t producing weird/biased joint-priors. Thank you all (@mhollanders , @mbc , @jgoldberg , and @Dalton ) for excellent ideas and links!