Capture-Recapture model with partial or complete pooling

Another great resource is @betanalpha Patreon page https://www.patreon.com/betanalpha/

2 Likes

So I’ve run the model a few times and looked at the diagnostics. Looks like everything checks out so far.

2 Likes

Here is what I am getting out on my end

> precis(m, 3)
                        mean    sd   5.5%  94.5% n_eff Rhat4
omega[1,1]              0.71  0.06   0.62   0.82  1372  1.01
omega[1,2]              0.59  0.07   0.49   0.72   714  1.00
omega[2,1]              0.26  0.03   0.21   0.32  1117  1.00
omega[2,2]              0.26  0.03   0.21   0.31  1743  1.00
p[1,1,1]                0.18  0.02   0.15   0.21  1578  1.00
p[1,1,2]                0.35  0.03   0.30   0.41  1012  1.01
p[1,2,1]                0.18  0.02   0.14   0.21   999  1.00
p[1,2,2]                0.31  0.04   0.24   0.37   663  1.00
p[2,1,1]                0.20  0.03   0.16   0.26   821  1.00
p[2,1,2]                0.36  0.04   0.30   0.43   829  1.00
p[2,2,1]                0.18  0.02   0.14   0.22  1847  1.00
p[2,2,2]                0.34  0.04   0.28   0.40  1452  1.00
p_bar_population[1,1]   0.28  0.20   0.05   0.71  1171  1.00
p_bar_population[1,2]   0.39  0.19   0.12   0.77  2151  1.00
p_bar_population[2,1]   0.28  0.21   0.04   0.73   984  1.00
p_bar_population[2,2]   0.38  0.20   0.10   0.78  1712  1.00
p_bar_region[1,1]       0.28  0.20   0.05   0.69  1395  1.00
p_bar_region[1,2]       0.39  0.20   0.10   0.79  1391  1.01
p_bar_region[2,1]       0.28  0.20   0.04   0.71  1372  1.00
p_bar_region[2,2]       0.38  0.19   0.09   0.74  1931  1.00
p_bar_combined[1]       0.18  0.03   0.15   0.23  1533  1.00
p_bar_combined[2]       0.34  0.03   0.29   0.39  1313  1.00
sigma_population        0.27  0.29   0.03   0.78   781  1.00
sigma_region            0.27  0.32   0.03   0.74   902  1.00
sigma_combined          0.03  0.03   0.01   0.09   450  1.01
N_raw[1,1]              0.57  0.09   0.43   0.72  1325  1.01
N_raw[1,2]              0.45  0.10   0.33   0.62   700  1.00
N_raw[2,1]              0.16  0.03   0.10   0.21   994  1.00
N_raw[2,2]              0.16  0.03   0.11   0.22  1598  1.00
N[1,1]                711.00 61.00 619.00 814.00  1325  1.01
N[1,2]                586.67 71.55 496.00 718.00   700  1.00
N[2,1]                260.33 30.48 213.00 310.00   994  1.00
N[2,2]                257.06 30.51 213.00 307.00  1598  1.00
1 Like

Thank you so much for having a look @Ara_Winter ! I wonder, though. Isn’t something awry here? All regions/populations have estimations for which the true values are outside the 90% credible intervals:

600    619-814
800    496-718
350    213-310
450    213-307

Doesn’t that suggest that something is wrong either with the model or with the simulation?

Oh yeah good catch @erwald Do you have the 50% intervals? I usually look at those. I didn’t see anything weird in the diagnostics but let me circle back around to that.

This may be the uniform priors causing problems. If I remember correctly we don’t want uniform priors in this kind of model. I think @mbjoseph mentioned something about that up thread.

Maybe this is helpful @Ara_Winter ?

N[1,1] (true value 600)
image

N[1,2] (true value 800)
image

N[2,1] (true value 350)
image

N[2,2] (true value 450)
image

1 Like

In Bayesian Population Analysis, Kéry & Schaub use uniform priors for p & omega. But of course my model is different from theirs. Maybe I should put priors on p_bar_X & p_bar_X_combined, for example?

Edit: maybe I’m transforming the data wrongly before feeding it into Stan. Bc the model still produces bad estimates even after commenting out the partial pooling. Looks like it might be training on transformed arrays or something like that?

Edit II: I summed the input columns in generated quantities & all of them had the right values. So I guess something’s the matter with the model after all!

1 Like

For comparison, here are the same four posterior probability plots for each of aug1 ... aug4 ran individually through the Mt model in the Stan repository:

image
image
image
image

I see that it’s not doing that much better. Then I tried to change my simulation to keep the same detection probability for all observations. Then the results are much better:

image
image
image
image

Again, targets here were 600, 800, 350 & 450. This looks great!

That leads me to suspect that maybe there’s a bug in the implementation of Mt in the Stan repository, or – & maybe this is more likely – it’s just that much harder to infer population size from capture-recapture data with variable detection probability. What do you think?

So if I remember correctly the variable detection probability can be an issue. It’s talked about here

and here

but that’s diving into an area I am not as familiar with.

2 Likes

Thanks @Ara_Winter, I’ll have a look a both of those!

After taking a closer look at the M_t Stan program from BPA (link to Mt.stan file), it looks like there’s an error in the likelihood.

Here’s the relevant Stan code:

data {
  int<lower=0> M;               // Size of augumented data set
  int<lower=0> T;               // Number of sampling occasions
  int<lower=0,upper=1> y[M, T]; // Capture-history matrix
}

transformed data {
  int<lower=0> s[M];            // Totals in each row
  
  for (i in 1:M) {
    s[i] = sum(y[i]);
}

parameters {
  real<lower=0,upper=1> omega;  // Inclusion probability
  vector<lower=0,upper=1>[T] p; // Detection probability
}

model {
  // Likelihood
  for (i in 1:M)
    if (s[i] > 0)
      // z[i] == 1
      target += bernoulli_lpmf(1 | omega)
              + bernoulli_lpmf(y[i] | p);
    else // s[i] == 0
      target += log_sum_exp(bernoulli_lpmf(1 | omega)   // z[i] == 1
                            + bernoulli_lpmf(0 | p),
                            bernoulli_lpmf(0 | omega)); // z[i] == 0
}

Notice in the likelihood that when an individual is not detected (s[i] == 0), in the first term of the log_sum_exp there is one zero hardcoded into the Bernoulli observation model (bernoulli_lpmf(0 | p)). This should be a vector of zeros of length T, not just one zero, because if the individual is in the population (if z[i] == 1), then we failed to detect them on sampling occasion t=1, ..., T.

We can correct this by using y[i] (an integer array of 0 values) instead of 0 in the likelihood:

data {
  int<lower=0> M;               // Size of augumented data set
  int<lower=0> T;               // Number of sampling occasions
  int<lower=0,upper=1> y[M, T]; // Capture-history matrix
}

transformed data {
  int<lower=0> s[M];            // Totals in each row
  
  for (i in 1:M) {
    s[i] = sum(y[i]);
}

parameters {
  real<lower=0,upper=1> omega;  // Inclusion probability
  vector<lower=0,upper=1>[T] p; // Detection probability
}

model {
  // Likelihood
  for (i in 1:M)
    if (s[i] > 0)
      // z[i] == 1
      target += bernoulli_lpmf(1 | omega)
              + bernoulli_lpmf(y[i] | p);
    else // s[i] == 0
      target += log_sum_exp(bernoulli_lpmf(1 | omega)   // z[i] == 1
                            + bernoulli_lpmf(y[i] | p),
                            bernoulli_lpmf(0 | omega)); // z[i] == 0
}

Or, you could use binomial_lpmf(0 | T, p) instead of bernoulli_lpmf(y[i] | p) if you prefer.

I’m not sure whether this will solve all of your problems, but it’s definitely worth fixing. I’ll submit a PR to the stan-dev/example-models repo to correct this…

Edit: opened an issue related to this, which affected a few other capture-recapture models as well: Subtle error in some capture-recapture likelihoods · Issue #204 · stan-dev/example-models · GitHub

4 Likes

That does seem to marginally improve the results. Here are the experiments I ran further up but now with the fixed version of Mt (true values 600, 800, 350 & 450):

            mean    sd   5.5%  94.5% n_eff Rhat4
N11        676.92 69.76 574.00 794.00  1041     1
N12        708.04 106.81 552.00 902.06   731     1
N21        215.81 29.76 176.00 268.00  1282     1
N22        273.11 56.91 198.00 374.00   676  1.01

or visually:

600
image

800
image

350
image

450
image

Comparing with the graphs I posted further up, it seems Mt is doing a little bit better now though it’s still pretty off, especially for the last two groups, which I simulated with lower overall detection probabilities & lower population sizes.

All of these can be compared to just doing the Lincoln-Petersen calculations manually – Mt gets very similar results to these:

> sum(aug1[,1]) * sum(aug1[,2]) / sum(apply(aug1, 1, min))
[1] 673.8261 # true 600
> sum(aug2[,1]) * sum(aug2[,2]) / sum(apply(aug2, 1, min))
[1] 704.52 # true 800
> sum(aug3[,1]) * sum(aug3[,2]) / sum(apply(aug3, 1, min))
[1] 211.5 # true 350
> sum(aug4[,1]) * sum(aug4[,2]) / sum(apply(aug4, 1, min))
[1] 264.1429 # true 450

In general, I’ve noticed that the models are pretty sensitive to the detection probability & true population size I use when simulating. When I use probabilities (0.2, 0.4), (0.12, 0.24), (0.15, 0.3), (0.09, 0.18) – those are the detection probabilities for (1st observation, 2nd observation) of each group – then Mt doesn’t do very well & I get the results you see above. But when simulating with detection probabilities (0.4, 0.8), (0.24, 0.48), (0.3, 0.6), (0.18, 0.36) it does better (true values still 600, 800, 350 & 450):

             mean    sd   5.5%  94.5% n_eff Rhat4
N11        576.62 13.28 557.00 599.00  2465     1
N12        849.90 52.33 768.00 940.06   980     1
N21        390.24 27.47 350.00 438.00  1573     1
N22        408.39 49.49 342.00 493.00   900     1

Here’s another experiment, running Mt on the original, lower detection probabilities – (0.2, 0.4), (0.12, 0.24), (0.15, 0.3), (0.09, 0.18) – but with true populations that are an order of magnitude higher (6000, 8000, 3500 & 4500):

              mean     sd    5.5%   94.5% n_eff Rhat4
N11        5860.57 180.24 5577.00 6158.00   881  1.01
N12        7841.95 430.97 7200.00 8551.00   639  1.01
N21        3564.71 226.09 3227.95 3948.00   749     1
N22        4754.26 508.04 4016.00 5633.06   723     1

That’s much better! I guess a sample N in the low hundreds is just a little bit too noisy for Mt (& the Lincoln-Petersen method generally) to be able to make good inferences?

My own partially pooled model does marginally better than Mt, it seems. Here are the results using the simulated data with larger population sizes (6000, 8000, 3500 & 4500):

                        mean     sd    5.5%   94.5% n_eff Rhat4
N[1,1]               5933.74 189.73 5644.00 6248.00  2613     1
N[1,2]               7845.02 436.84 7181.94 8561.06  1925     1
N[2,1]               3592.52 225.07 3244.94 3972.00  1901     1
N[2,2]               4549.53 479.72 3848.00 5362.22  1780     1

N[2,2] sees a pretty big improvement there, which is as I’d expect with a partially pooled model given that it’s got the lowest effective sample size (meaning it benefits from information pooled from the other groups).

2 Likes

Here I calculate the LP estimates for a bunch of different population sizes & detection probabilities:

image

(The percentage error is just abs(1 - estimated_N / actual_N). I’m sure there’s a better way to calculate this, but I figured I couldn’t use the residuals as they’d be much higher for higher population sizes, right? So that’s why I opted for this measurement.)

Here’s the same thing but with the detection probability halved in the first observation:

image

So it seems likely that I was getting unsatisfactory results because I was estimating population sizes under 1000, often with fairly low detection probabilities, & especially with detection probabilities that varied across different observations.

4 Likes

That is some nice sleuthing!

2 Likes

This also reminds me of some issues we had with a model to estimate species richness from a multi site occupancy model.

@erwald there may be some potential to improve the way detection probabilities are handled in the model, and work towards a model that you know works as intended. I’ll outline some specific things below, and suggest a strategy for moving forward at the end.

Here are the parts of the Stan program that I’ll focus on:

parameters {
  // Inclusion probability - the probability that any potential member of N (that is, any member of
  // M) is in the true population N.
  real<lower=0, upper=1> omega[P, R];


  // Detection probability - the probability that any member of N is captured.
  vector<lower=0, upper=1>[T] p[P, R];
  vector<lower=0, upper=1>[T] p_bar_population[P];
  vector<lower=0, upper=1>[T] p_bar_region[R];
  vector<lower=0, upper=1>[T] p_bar_combined;
  real<lower=0> sigma_population;
  real<lower=0> sigma_region;
  real<lower=0> sigma_combined;
}


model {
  for (i in 1:P) {
    p_bar_combined ~ normal(p_bar_population[i], sigma_population);
  }
  for (i in 1:R) {
    p_bar_combined ~ normal(p_bar_region[i], sigma_region);
  }


  // Go through all potential members.
  for (i in 1:P) {
    for (j in 1:R) {
      p[i, j] ~ normal(p_bar_combined, sigma_combined);
...

Some specific things:

  • Improper priors on standard deviation parameters can be challenging. I’d recommend placing proper priors on these, rather than using Uniform(0, Inf) priors.

  • It’s unconventional to use normal priors for probability parameters that are bounded between 0 and 1. It would be more typical to use normal priors on a logit scale instead, if you want to model heterogeneity among populations and regions.

  • The model block is incrementing the target density multiple times for p_bar_combined, in two different loops. This looks like it could be unintentional:

  for (i in 1:P) {
    p_bar_combined ~ normal(p_bar_population[i], sigma_population);
  }
  for (i in 1:R) {
    p_bar_combined ~ normal(p_bar_region[i], sigma_region);
  }
  • Overall this model is quite complex, modeling heterogeneity in detection among populations, among regions, and among population X region combinations (sigma_population, sigma_region, and sigma_combined respectively). That complexity can make it hard to diagnose what specific model components are leading to unexpected/undesirable behavior.

My recommendation would be to 1) simplify the model, 2) use prior predictive simulation to make sure the priors are reasonable, and 3) use simulation-based calibration (SBC) to make sure the implementation works. Then, incrementally add complexity (one thing at a time), and re-do your prior checks and SBC. Here’s a pull request to get you started with SBC: Use simulation-based calibration for a simple verison of the model by mbjoseph · Pull Request #1 · erwald/capture-recapture-simulations · GitHub

Finally, it’s worth really digging into Michael Betancourt’s work on principled Bayesian workflows to see how these steps fit into the bigger picture: Towards A Principled Bayesian Workflow

4 Likes

Thank you! I will read through all of these resources as well as those I’ve been linked above, & look at your PR, though it will take some time. Fortunately I’m on holidays right now!

I did simplify the model somewhat last week, though I didn’t push those changes to GitHub yet. But I ended up with something like this:

parameters {
  // Inclusion probability - the probability that any potential member of N (that is, any member of
  // M) is in the true population N.
  real<lower=0, upper=1> omega[P, R];

  // Detection probability - the probability that any member of N is captured.
  vector<lower=0, upper=1>[T] p[P, R];
  real<lower=0, upper=1> p_bar_population[P];
  real<lower=0, upper=1> p_bar_region[R];
  real<lower=0> sigma_population[P];
  real<lower=0> sigma_region[R];
}

model {
  for (i in 1:P) {
    for (j in 1:R) {
      p[i, j] ~ normal(p_bar_population[i], sigma_population[i]);
      p[i, j] ~ normal(p_bar_region[j], sigma_region[j]);
    // ...
    }
  }
}

But I will have to revisit that in the light of your resources & suggestions.

Do you have any resources on this? I don’t have Statistical Rethinking with me right now, but from what I’ve seen in it McElreath uses uniform priors for this, & the sentiment seems to be that this is generally recommended. Is that wrong, then? Maybe I just didn’t get far enough in the book.

I dug into this a bit more, by the way, & realised that bad LP estimates seem to happen when k values (in n/N = k/K, so the recaptured individuals that are also marked) are low, such that small variation there affect estimations a lot. I plotted the squared log error for a bunch of different k for a few scenarios with different population sizes & detection probabilities & it’s pretty clear which scenarios give unreliable estimates:

That should explain the bad estimations I got in some simulations before …

1 Like

There are a few things to think about here:

  1. We don’t want to put the cart before the horse, and IMO you’ll be well-served in working through the following points with a simplified version of the model (e.g., start off with either heterogeneity in populations or regions, but not both).

  2. Heterogeneity in detection probabilities is usually modeled with normal priors on a logit scale, but your current model is using normal priors on a probability scale. A more typical model would look like:

p_{i, j} = \text{logit}^{-1}(...),

where \text{logit}^{-1} is the inv_logit function.

  1. The code chunk you pasted is assigning two different priors to detection probabilities p[i, j], which is unusual. I wonder whether you would rather do something like:

p_{i, j} = \text{logit}^{-1}(\mu + \epsilon_i + \eta_j),

where \mu is an intercept parameter, \epsilon_i is a population adjustment, and \eta_j a region adjustment.

  1. It rarely makes sense for standard deviation parameters that describe variation in detection probabilities on the logit scale to be large, and we want our priors to reflect this. We can use simulation to understand why - for example, if we simplify the model to include only one source of heterogeneity in detection probability, we might get something like this:

\text{logit}(p_i) \sim \text{Normal}(0, \sigma),

where p_i is the detection probability for individual i, and \sigma represents among-individual heterogeneity in detection.

Let’s look at the distribution of detection probabilities for different values of the standard deviation \sigma. Here we’ll draw 10,000 values for the specific values of \sigma=1, \sigma=2, \sigma=10, and \sigma=100:

library(ggplot2)

prior_sims <- expand.grid(index = 1:10000, 
                          sd = c(1, 2, 10, 100))
prior_sims$p <- plogis(rnorm(nrow(prior_sims), mean = 0, sd = prior_sims$sd))

ggplot(prior_sims, aes(x = p)) + 
  geom_histogram() + 
  facet_wrap(~sd, nrow = 1) + 
  ggtitle("Detection probability priors by standard deviation (facets)") + 
  xlab("Detection probability (p)") + 
  ylab("Frequency")

Notice that for large values, the distribution of detection probabilities is bimodal with most values concentrating near 0 or 1. This would imply that individuals are either always or never captured, which in most cases would be undesirable. The analogous implications for heterogeneity among populations and regions are equally disturbing (e.g., individuals in a population are either always or never detected). Unless we really believe that such a situation is likely, we want our priors to reflect our belief that the standard deviation is probably small.

I would recommend priors for these standard deviation parameters concentrated around smaller (desirable) values, and has low probability density for large (undesirable) values. As you incrementally are adding complexity to your simplified model, you can use the prior predictive distribution to decide on specific priors that you’re comfortable with. Last, simulating from the priors will be impossible with an improper prior, which should give another important bit of motivation to use proper priors.

Hope this helps!

5 Likes