Capture-Recapture model with partial or complete pooling

So what I want to analyze is the basic capture-recapture problem (as described e.g. here in the manual) but with several groups, with partial or complete pooling (I’m curious to learn how to achieve both). That is, I have an initial capture M, then a second capture C of which R were also part of the initial capture. Here is how I do it for just one group:

stancode <- "
data {
  int<lower=0> M;
  int<lower=0> C;
  int<lower=0> R;
}
parameters {
  real<lower=(M + C - R)> population;
}
model {
  R ~ binomial(C, M / population);
}
"
fit <- stan(model_code = stancode, data = list(
  N = 4,
  M = 10,
  C = 21,
  R = 7
))

That works just fine. It gives me:

> precis(fit)
            mean    sd  5.5% 94.5% n_eff Rhat4
population 41.47 17.66 25.53 71.35  1610     1

which is strangely a bit off what you’d get doing the calculation manually, which is 54, but close enough.

Okay, but say I have done this thing in several different areas. So I expect the results to be similar in the different areas but not exactly the same. I thought I could analyze this using a partially pooled model. My first question is, does that even make sense in theory? My second question is, how to achieve that in practice? I tried something like this:

stancode <- "
data {
  int<lower=0> N;
  int<lower=0> min_N;
  real<lower=0> M[N];
  int<lower=0> C[N];
  int<lower=0> R[N];
}
parameters {
  real<lower=min_N> population[N];
}
model {
  for (i in 1:N)
    R[i] ~ binomial(C[i], M[i] / population[i]);
}
"
fit <- stan(model_code = stancode, data = list(
  N = 4,
  M = c(10, 3, 39, 21),
  C = c(21, 5, 41, 19),
  R = c(7, 2, 13, 7),
  min_N = min(c(21, 5, 41, 19) + c(10, 3, 39, 21) - c(7, 2, 13, 7))
))

But that doesn’t work. I get errors like:

Chain 4: Rejecting initial value:
Chain 4:   Error evaluating the log probability at the initial value.
Chain 4: Exception: binomial_lpmf: Probability parameter is 1.36141, but must be in the interval [0, 1]  (in 'model140a97fce2c5e_d2d9125dd1b3093ae8989d5eb459892f' at line 14)

I think because I’m giving all the values of the array parameter population the same minimum constraint, whereas I’d actually like to give each of these population array fields an individual lower bound.

I’ve been looking around for a way to solve this but haven’t managed to find anything related to this precise problem. So that’s why I’m here!

NB. I am a relative newbie to Stan, Bayesian analysis & statistics generally so do keep that in mind. But that of course also means that I’ll be three times as grateful for any answers I get!

I think before moving on to the partial and full pooling I’d sort out what my priors are on the model first. Right now I think they are defaulting to uniform which will cause problems down the line.

Which type of capture-recapture model is this? Kinda looks similar to JS or CJS?

Oh, the priors I didn’t even think about yet. It could well be that I’m doing things in the wrong order.

As far as I know, this is a Lincoln-Petersen type capture-recapture model. It closely follows the example in the manual. I was under the impression that CJS &c. are more for dynamic models where the population changes between captures.

1 Like

So usually my method is to the get the simplest version of the model up and running with full priors, a simulated/fake data run to see if the parameters are recovered, and then a run with some real data.

For your stan model call I would also be explicit about as many parts as you can. Like
chains = 4, iter = 2000, cores = 6 you’ll need to change cores to whatever you have.

For the priors I usually drop in a comment about where they came from using the #

Hmm. If I draw up a (non-uniform) prior on population, then surely my one data point will do basically nothing? Maybe I’m not understanding you very well. But either way something is wrong with this model. First I tried with the prior:

set.seed(1000)
N <- 600
M <- rbinom(1, N, .1) # -> 59
C <- rbinom(1, N, .1) # -> 55
R <- rbinom(1, C, M/N) # -> 5
stancode <- "
data {
  int<lower=0> M;
  int<lower=0> C;
  int<lower=0> R;
}
parameters {
  real<lower=(M + C - R)> population;
}
model {
  R ~ binomial(C, M / population);
  
  // estimate 10 years ago was 1000
  population ~ normal(1000, 500);
}
"
fit <- stan(model_code = stancode,
            data = list(M = M, C = C, R = R),
            chains = 4,
            iter = 2000,
            cores = 4)

having looked at the prior (dens(rnorm(10000, 1000, 500)), but is this the right way of doing it?) & finding it kinda reasonable:

image

and got:

> precis(fit)
             mean     sd   5.5%   94.5% n_eff Rhat4
population 894.69 313.73 482.56 1467.39  1422     1

But without the prior I get a value closer to the prior, confusingly:

> precis(fit)
              mean     sd   5.5%   94.5% n_eff Rhat4
population 1092.94 785.57 435.87 2397.97   888     1

And then if I try setting N <- 6000 instead (an order of magnitude more) & resimulate everything, I get (without the prior):

> precis(fit)
              mean      sd   5.5%   94.5% n_eff Rhat4
population 3087.69 4962.25 754.31 7931.49   689     1

Whereas if I just do the calculation without Stan I get much more accurate estimations:

> M*C/R # for N = 600
[1] 649

> M*C/R # for N = 6000
[1] 7019.792

> M*C/R # for N = 60 (seed 1001)
[1] 60

So to be honest this model seems pretty useless, if I’m not mistaken? But it’s the exact model presented in the manual.

Now I went to the manual & noticed that there was an upper constraint on the R data variable there as well. Adding that one doesn’t help:

set.seed(1000)
N <- 600
M <- rbinom(1, N, .1)
C <- rbinom(1, N, .1)
R <- rbinom(1, C, M/N)
stancode <- "
data {
  int<lower=0> M;
  int<lower=0> C;
  int<lower=0, upper=min(M, C)> R;
}
parameters {
  real<lower=(M + C - R)> population;
}
model {
  R ~ binomial(C, M / population);
}
"
fit <- stan(model_code = stancode,
            data = list(M = M, C = C, R = R),
            chains = 4,
            iter = 2000,
            cores = 4)
> precis(fit)
             mean     sd   5.5%   94.5% n_eff Rhat4
population 1049.7 636.38 436.92 2173.95  1454     1
1 Like

Given the lower bound on N I’d start with something like
N ~ cauchy(0, 800); where N = population

But I think there are better priors under the binomial. I just can’t remember.

Just parachuting in here to suggest checking out the capture-recapture models for closed populations in the Bayesian Population Analysis Stan translations, which include a very simple model M_0: example-models/M0.stan at master · stan-dev/example-models · GitHub

Model M_0 is more intuitive to me, partly because I find it easier to think about a prior for population size. As a reminder model M_0 with data augmentation and a uniform prior on the inclusion probability \omega imposes a discrete uniform prior from C to M, following the notation in the link above.

The Lincoln-Petersen example in the manual is not wrong per se, but it is somewhat awkward in that it is a probabilistic treatment of a maximum likelihood estimate, and population size is treated as continuous. The improper prior for abundance is also pretty weird.

4 Likes

I was poking around on your site @mbjoseph I just couldn’t remember which model it was. Thanks!

2 Likes

I totally forgot to ask. What is your experience with Stan, modeling, and such?

Oh, it’s very modest. I am about halfway through McElreath’s Statistical Rethinking & have done a couple of small pet projects outside of that. I’m somewhat better versed in data science & machine learning as I am a programmer by trade. But I’m pretty new to statistics.

I really appreciate the help so far, by the way. I’ve got mbjoseph’s example model up & running & it produces some pretty good results, much better than the Lincoln-Petersen model. But I still need to go through it properly to really understand it. I may well return with more questions in the near future. :-)

1 Like

Feel free to tag me in down the road @erwald . I’ve been tasked with getting these kinds of models (along with some growth models) up and running for fish and a few other things.

@Ara_Winter ask & ye shall receive!

I have some questions about the model (comments are mine; please correct them if they are mistaken):

model {
  // Go through all potential members.
  for (i in 1:M) {
    if (s[i] > 0) {
      // This potential member was observed, which means it's a member of N.
      
      // Prob. of 1 given probability omega. That is, the prob. that this one
      // is a real member of N.
      target += bernoulli_lpmf(1 | omega) +
      // Prob. of s[i] successes in T trials given probability p. That is, the 
      // prob. that this one was sighted during the sampling occasions.
        binomial_lpmf(s[i] | T, p);
    } else { // s[i] == 0
      // This potential member was not observed, which means we don't know if
      // it's a member of N or not.
      
      // First, add probability that it it is a real member but wasn't sighted
      // in any of the sampling occasions.
      target += log_sum_exp( bernoulli_lpmf(1 | omega)
                             + binomial_lpmf(0 | T, p),
      // Second, add probability that it's not a real member.
                             bernoulli_lpmf(0 | omega));
    }
  }
}

Questions:

  1. This is a fundamental question about Stan, I guess. But what does target += ... do? I understand that it adds to the target log density. A density is just another word for a probability distribution, right? But what’s the target density? Here I read that it’s the density from which the sampler samples, but samples for what?
  2. Why log_sum_exp in the else branch of the conditional? I suppose because of … maths … but dunno why. (I get that the first term in the log_sum_exp is a sum bc they are both logarithms, so it’s like multiplying the probabilities normally.)
  3. @mbjoseph mentioned that this model was more intuitive bc it involves a prior for population size. But I don’t see any prior for population size. In fact, from what I can tell, all the priors are uniform. In which sense is there a prior for population size to think about?

Apart from those three confusions, I can happily report that I really understand how the model works. Now I will move on to the more advanced version where detection probability varies by sampling occasion, & then I can try to modify it to suit my purposes.

The target density is something proportional to the posterior probability of a given combination of model parameters. target += increments the logarithm of the target density.

The log_sum_exp is because the events are mutually exclusive. To see what this is doing, it’s easier to think about what happens on the probability scale (rather than the log probability scale). There are two mutually exclusive ways to observe a zero. Either the latent state is zero, or the latent state is 1 but the member is unobserved. As you say, the first term is a sum because independent probabilities multiply (so they add on the log scale). But mutually exclusive probabilities add. So if you have two log-probabilities, you exponentiate them to get probabilities, add those, and then take the logarithm to get a log-probability. Thus log_sum_exp.

The implicit prior for the population size is the number of observed members plus omega times the number of data-augmented never-observed members that you’ve included. In the words of @mbjoseph:

3 Likes

I hope you didn’t think you’d heard the last of me @Ara_Winter ! I’ve now pushed some code here but have some questions still that I hope you or somebody else can answer.

The problem I’m trying to solve is calculating death counts in a civil war. So in my simulated example I have two factions & two regions. I’m assuming (& simulating to produce the effect) that the probability of being recorded is correlated both in terms of side (what I, confusingly I now realise, call population in the code) & region – maybe one side was doing the recording so that results are biased towards them, & maybe the records were generally more sparse in one of the regions. The capture-recapture data could come from, say, first government records & then a survey (so they would have different detection probabilities). I am inferring the killings for each side in each region, with partial pooling in the detection probability, giving me four N estimations as output.

(This model is based on the Mt model here, itself based on one from the Bayesian Population Analysis book kindly pointed out to me by @mbjoseph above.)

When I run this generating the data like so (the values are true population size, probability of being captured for the population, probability of being captured for the region):

aug1 <- f(600, .8, .5) # pop 1, region 1
aug2 <- f(800, .8, .3) # pop 1, region 2
aug3 <- f(350, .6, .5) # pop 2, region 1
aug4 <- f(450, .6, .3) # pop 2, region 2

I get the results:

                        mean    sd   5.5%  94.5% n_eff Rhat4
N[1,1]                675.96 91.79 543.00 826.00   223  1.03
N[1,2]                809.58 96.27 655.00 963.00    96  1.05
N[2,1]                372.91 71.57 288.00 492.00   729  1.01
N[2,2]                356.31 88.00 262.00 520.00   813  1.02

which seems fairly solid, I guess? Now to my questions:

  1. The thing I’m least confident about is the partial pooling. Does this make sense even in theory? Now that I’m writing all this down I’m thinking that it may be more likely that omega (the inclusion probability) is correlated in population & region – e.g. bc one side or one region is much more populous. Maybe I should be using partial pooling on omega instead? Or both? Or neither?
  2. About the partial pooling practically, am I doing it right? I’m not at all confident about this!
  3. When calculating N finally, I ran into problems bc Stan didn’t allow me to set constraints for each element of a vector individually. So I scaled it to a (0, 1) range, & then scaled it up again, like this. But did I do it correctly?
  4. I was pretty unhappy with the kind of ridiculous stuff I had to do to get my R array into the right format for Stan. Bonus points to anyone who points out a better way for me!

I think that’s it for now. I’ll take this opportunity to thank everyone who’s helped me thus far! :-)

2 Likes

Oh nice. I like the use of the model for this kind of data. Let me take a look at the data and model. I always struggle with 4 and sometimes give up and just code it up in python.

1 Like

On 1. I usually go for partial pooling if there is a possibility of structure. Like if you best guess, prior knowledge indicates that geography or side is likely to matter. I would use partial pooling.

Sorry I am slow responding. Annual report is due up in a few weeks and I am coding up models like crazy for it.

For 2. I usually throw my model into brms (if possible) to double check if I have my partial pooling set right. Maybe @maxbiostat or @betanalpha has some better advice for that. I’ll have more time tomorrow to look into that.

1 Like

For definitions of a probability distribution and a probability density function see Probability Theory (For Scientists and Engineers).

Samples are generated from a probability distribution as a way to approximate expectation values. For more on expectation values see the above reference, and for more on sampling see for example Probabilistic Computation.

See for example An Introduction to Stan.

See for example Hierarchical Modeling.

3 Likes

Just got it up and running on my end.

1 Like

Really appreciate all the help I’ve been getting here! I’m now (slowly) reading @betanalpha’s sources & anticipating your comments, @Ara_Winter.

🙏

1 Like