Hello all,
My colleagues and I are fitting a distance-sampling model to count data of 8 species of birds at >1,800 locations over 5 years using Stan. Counts were conducted once per year and observers recorded the distance to each observation. We’re binning detections into 5 distance bins and using a half-normal detection function to estimate bin-specific detection probability and overall detection probability at the sampling point.
We’ve written the likelihood of a counting y individuals as Poisson(\lambda*p), where \lambda is the expected abundance and p is overall detection probability at the point (the sum of binned probabilities). I’m a recent convert from JAGS and while I find Stan challenging, it’s also been incredibly rewarding! This model runs great even though it’s quite complex. I’ve provided a simplified version below.
My 2 questions are:
- Are we deriving realized abundance (N) correctly? (this is the main question)
- Are there any ways we can improve efficiency of the model (e.g., better indexing).
I know that Stan currently cannot handle latent discrete variables, so in order to get N we have to derive it in the generated quantities block. I’ve seen discussions on this forum on how to do that, mainly, considering a range of N from the maximum count y_{max} to some upper bound of the population size K:
but, I have also seen someone here say if you have some other data informing detection (like distance-to-observer), then you don’t need to do this?
Equation 5 from Royle et al. 2004 also seems agree with this, though I am by no means a statistician and may be misunderstanding.
https://esajournals.onlinelibrary.wiley.com/doi/epdf/10.1890/03-3127
As I said above, the model runs fine and we are getting realistic estimates from test runs, but I want to make sure I’m doing everything correctly.
Thanks for your time,
John
data {
// indexing vectors
// All integers with a lower bound of 0 <lower=0>
int<lower=0> nyears ; // Number of years (n = 5; range = 2013-2018)
int<lower=0> nspec ; // Number of species (n = 8)
int<lower=0> nsites ; // Number of sites sampled (n = 1859)
int<lower=0> ndbins; // Number of distance bins (n = 5)
// Counts
// Number of birds detected at each site, in each year, of each species
int<lower=0> y[nsites, nyears, nspec] ;
// Number of birds detected at each site, within each of 5 distance bins, in
// each year, of each species
int<lower=0> ydb[nsites, ndbins, nyears, nspec] ;
// detection inputs
real<lower=0> pix[nspec,ndbins]; // proportion of the sampling point
// encompassed by each distance bin for each species
real<lower=0> point_area[nspec]; // area of sampling point for each species
matrix<lower=0>[nspec,ndbins+1] db; // distance bins for each species
}
parameters {
// Community-level estimate of expected abundance at a given site
real mean_alpha0 ; // mean
real<lower=0> sd_alpha0 ; // standard deviation
matrix[nspec,nyears] z_alpha0 ;
// Scale parameter dictating the shape of the relationship between
// distance-to-observer and detection probability (varies annually)
real<lower=0> mean_beta0 ; // mean
real<lower=0> sd_beta0 ; // standard deviation
matrix<lower=0>[nspec,nyears] z_beta0;
}
transformed parameters{
matrix[nspec,nyears] alpha0 = mean_alpha0 + sd_alpha0*z_alpha0;
matrix<lower=0>[nspec,nyears] beta0 = mean_beta0 + sd_beta0*z_beta0;
// site x year x species log E(N)
real loglambda[nsites,nyears,nspec];
// bin-specific detection probabilities for each year x species (unnormalized)
real<lower=0> p[nyears,nspec,ndbins];
// normalized bin-specific detection probabilities for each year x species
real<lower=0> ppi[nyears,nspec,ndbins] ;
// normalized bin-specific detection probabilities for each year x species
// scaled by the overall species and year-specific detection prob. so they
// sum to 1 for use in the multinomial likelihood
real<lower=0,upper=1> ppi_normalized[nyears,nspec,ndbins] ;
// overall species and year-specific detection prob.
real<lower=0,upper=1> pPerc[nyears,nspec] ;
// log overall species and year-specific detection prob.
real logpPerc[nyears,nspec] ;
// real logitpPerc[nyears,nspec] ;
for(s in 1:nspec){ // loop over species
for(t in 1:nyears){ // loop over years
for(b in 1:ndbins){ // loop over distance bins
// Calculate bin-specific detection probabilities for each year x species
// We do so through analytical integration of the half normal detection
// function.
p[t,s,b] = (beta0[s,t]^2*(1-exp(-db[s,b+1]^2/(2*beta0[s,t]^2)))-beta0[s,t]^2*(1-exp(-db[s,b]^2/(2*beta0[s,t]^2))))*2*3.1416/(point_area[s]*pix[s,b]);
ppi[t,s,b] = p[t,s,b]*pix[s,b]; // normalized so that the sum of this
// vector equals the total probability of detecting a bird at the sampling
// point
} // distance bin loop
// sum the normalized bin-specific detection probabilities for each year x
// species to get the overall species and year-specific detection prob.
pPerc[t,s] = sum(ppi[t,s,1:ndbins]) ;
for(b in 1:ndbins){ // loop over distance bins
// ensure the normalized bin-specific detection probabilities for each
// year x species sum to 1 for use in the multinomial likelihood
ppi_normalized[t,s,b] = ppi[t,s,b] / pPerc[t,s] ;
} // distance bin loop
for(i in 1:nsites){ // loop over sites
loglambda[i,t,s] = alpha0[s,t] ;
} // site loop
} // year loop
}
logpPerc = log(pPerc); // log overall species x year-specific detection prob.
} // species loop
model {
// Prior distributions for community level abundance mean and variance pars.
mean_alpha0 ~ normal(0, 2) ; // Intercept (varies annually)
sd_alpha0 ~ normal(0,1); //
// scale parameter dictating the shape of the relationship between
// distance-to-observer and detection probability (varies annually)
mean_beta0 ~ normal(80,20) ;
sd_beta0 ~ normal(15,20);
// z's are equivalent to the a's in the un-centered parameterization
// alpha ~ l + ab
for(s in 1:nspec){ // species loop
for(t in 1:nyears){ // year loop
z_alpha0[s,t] ~ normal(0,1) ; // intercept on E(N)
z_beta0[s,t] ~ normal(0,1) ; // scale parameter on detection model
} // year loop
} // species loop
for(i in 1:nsites){ // site loop
for(t in 1:nyears){ // year loop
for(s in 1:nspec){ // species loop
target += poisson_log_lpmf(y[i,t,s] | loglambda[i,t,s] + logpPerc[t,s])
+ multinomial_lpmf(ydb[i,1:ndbins,t,s] | to_vector(ppi_normalized[t,s,1:ndbins]));
} // species loop
} // year loop
} // site loop
} // model loop
generated quantities {
int<lower=0> N[nsites,nyears,nspec]; // Realized N
matrix[nyears,nspec] Ntot ; // Total N across all sites each year x species
int counter[nsites,nyears,nspec];
N = rep_array(0, nsites, nyears, nspec); // Set up N array
for(i in 1:nsites){
for(t in 1:nyears){ // year loop
for(s in 1:nspec){ // species loop
// remake the log-likelihood b/c you can't transfer model objects to the derived parameters block
real ll = poisson_log_lpmf(y[i,t,s] | loglambda[i,t,s] + logpPerc[t,s])
+ multinomial_lpmf(ydb[i,1:ndbins,t,s] | to_vector(ppi_normalized[t,s,1:ndbins]));
// Calculate Abundance - Restrict N to be at least as big as the number of birds observed
N[i,t,s] = poisson_log_rng(loglambda[i,t,s]);
counter[i,t,s] = 0;
// rejection sampling to ensure N >= y
while (N[i,t,s] < y[i,t,s]) {
N[i,t,s] = poisson_log_rng(loglambda[i,t,s]);
counter[i,t,s] += 1;
if (counter[i,t,s] > 100) break;
}
} // species loop
} // year loop
}
for(t in 1:nyears){
for(s in 1:nspec){
Ntot[t,s] = sum(N[,t,s]);
}
}
}