So what I want to analyze is the basic capture-recapture problem (as described e.g. here in the manual) but with several groups, with partial or complete pooling (I’m curious to learn how to achieve both). That is, I have an initial capture M, then a second capture C of which R were also part of the initial capture. Here is how I do it for just one group:
stancode <- "
data {
int<lower=0> M;
int<lower=0> C;
int<lower=0> R;
}
parameters {
real<lower=(M + C - R)> population;
}
model {
R ~ binomial(C, M / population);
}
"
fit <- stan(model_code = stancode, data = list(
N = 4,
M = 10,
C = 21,
R = 7
))
That works just fine. It gives me:
> precis(fit)
mean sd 5.5% 94.5% n_eff Rhat4
population 41.47 17.66 25.53 71.35 1610 1
which is strangely a bit off what you’d get doing the calculation manually, which is 54, but close enough.
Okay, but say I have done this thing in several different areas. So I expect the results to be similar in the different areas but not exactly the same. I thought I could analyze this using a partially pooled model. My first question is, does that even make sense in theory? My second question is, how to achieve that in practice? I tried something like this:
stancode <- "
data {
int<lower=0> N;
int<lower=0> min_N;
real<lower=0> M[N];
int<lower=0> C[N];
int<lower=0> R[N];
}
parameters {
real<lower=min_N> population[N];
}
model {
for (i in 1:N)
R[i] ~ binomial(C[i], M[i] / population[i]);
}
"
fit <- stan(model_code = stancode, data = list(
N = 4,
M = c(10, 3, 39, 21),
C = c(21, 5, 41, 19),
R = c(7, 2, 13, 7),
min_N = min(c(21, 5, 41, 19) + c(10, 3, 39, 21) - c(7, 2, 13, 7))
))
But that doesn’t work. I get errors like:
Chain 4: Rejecting initial value:
Chain 4: Error evaluating the log probability at the initial value.
Chain 4: Exception: binomial_lpmf: Probability parameter is 1.36141, but must be in the interval [0, 1] (in 'model140a97fce2c5e_d2d9125dd1b3093ae8989d5eb459892f' at line 14)
I think because I’m giving all the values of the array parameter population
the same minimum constraint, whereas I’d actually like to give each of these population array fields an individual lower bound.
I’ve been looking around for a way to solve this but haven’t managed to find anything related to this precise problem. So that’s why I’m here!
NB. I am a relative newbie to Stan, Bayesian analysis & statistics generally so do keep that in mind. But that of course also means that I’ll be three times as grateful for any answers I get!