Deriving abundance from a distance-sampling model

Hello all,

My colleagues and I are fitting a distance-sampling model to count data of 8 species of birds at >1,800 locations over 5 years using Stan. Counts were conducted once per year and observers recorded the distance to each observation. We’re binning detections into 5 distance bins and using a half-normal detection function to estimate bin-specific detection probability and overall detection probability at the sampling point.

We’ve written the likelihood of a counting y individuals as Poisson(\lambda*p), where \lambda is the expected abundance and p is overall detection probability at the point (the sum of binned probabilities). I’m a recent convert from JAGS and while I find Stan challenging, it’s also been incredibly rewarding! This model runs great even though it’s quite complex. I’ve provided a simplified version below.

My 2 questions are:

  1. Are we deriving realized abundance (N) correctly? (this is the main question)
  2. Are there any ways we can improve efficiency of the model (e.g., better indexing).

I know that Stan currently cannot handle latent discrete variables, so in order to get N we have to derive it in the generated quantities block. I’ve seen discussions on this forum on how to do that, mainly, considering a range of N from the maximum count y_{max} to some upper bound of the population size K:

but, I have also seen someone here say if you have some other data informing detection (like distance-to-observer), then you don’t need to do this?

Equation 5 from Royle et al. 2004 also seems agree with this, though I am by no means a statistician and may be misunderstanding.

As I said above, the model runs fine and we are getting realistic estimates from test runs, but I want to make sure I’m doing everything correctly.

Thanks for your time,
John


data {
  // indexing vectors
  // All integers with a lower bound of 0 <lower=0>
  
  int<lower=0> nyears ;  // Number of years (n = 5; range = 2013-2018)
  int<lower=0> nspec ;   // Number of species (n = 8)
  int<lower=0> nsites ;  // Number of sites sampled (n = 1859)
  int<lower=0> ndbins;   // Number of distance bins (n = 5)

  // Counts
  // Number of birds detected at each site, in each year, of each species
  int<lower=0> y[nsites, nyears, nspec] ; 
  
  // Number of birds detected at each site, within each of 5 distance bins, in 
  // each year, of each species
  int<lower=0> ydb[nsites, ndbins, nyears, nspec] ; 
  
  // detection inputs
  real<lower=0> pix[nspec,ndbins]; // proportion of the sampling point 
  // encompassed by each distance bin for each species
  real<lower=0> point_area[nspec]; // area of sampling point for each species
  matrix<lower=0>[nspec,ndbins+1] db; // distance bins for each species

}

parameters {

  // Community-level estimate of expected abundance at a given site
  real mean_alpha0 ;        // mean
  real<lower=0> sd_alpha0 ; // standard deviation
  matrix[nspec,nyears] z_alpha0 ; 
  
  // Scale parameter dictating the shape of the relationship between 
  // distance-to-observer and detection probability (varies annually)
  real<lower=0> mean_beta0 ; // mean
  real<lower=0> sd_beta0 ;   // standard deviation
  matrix<lower=0>[nspec,nyears] z_beta0; 

}

transformed parameters{

  matrix[nspec,nyears] alpha0 = mean_alpha0 + sd_alpha0*z_alpha0;
  matrix<lower=0>[nspec,nyears] beta0 = mean_beta0 + sd_beta0*z_beta0;
  
  // site x year x species log E(N) 
  real loglambda[nsites,nyears,nspec];
  
  // bin-specific detection probabilities for each year x species (unnormalized)
  real<lower=0> p[nyears,nspec,ndbins];
  
  // normalized bin-specific detection probabilities for each year x species
  real<lower=0> ppi[nyears,nspec,ndbins] ;
  
  // normalized bin-specific detection probabilities for each year x species
  // scaled by the overall species and year-specific detection prob. so they
  // sum to 1 for use in the multinomial likelihood
  real<lower=0,upper=1> ppi_normalized[nyears,nspec,ndbins] ;
  
  // overall species and year-specific detection prob.
  real<lower=0,upper=1> pPerc[nyears,nspec] ;
  
  // log overall species and year-specific detection prob.
  real logpPerc[nyears,nspec] ;
  // real logitpPerc[nyears,nspec] ; 
  
  for(s in 1:nspec){ // loop over species
   
    for(t in 1:nyears){ // loop over years
      for(b in 1:ndbins){ // loop over distance bins
      // Calculate bin-specific detection probabilities for each year x species
      // We do so through analytical integration of the half normal detection 
      // function.
      
      p[t,s,b] = (beta0[s,t]^2*(1-exp(-db[s,b+1]^2/(2*beta0[s,t]^2)))-beta0[s,t]^2*(1-exp(-db[s,b]^2/(2*beta0[s,t]^2))))*2*3.1416/(point_area[s]*pix[s,b]);
      ppi[t,s,b] = p[t,s,b]*pix[s,b]; // normalized so that the sum of this 
      // vector equals the total probability of detecting a bird at the sampling 
      // point
      } // distance bin loop
      
      // sum the normalized bin-specific detection probabilities for each year x
      // species to get the overall species and year-specific detection prob.
      pPerc[t,s] = sum(ppi[t,s,1:ndbins]) ;
      
      for(b in 1:ndbins){ // loop over distance bins
      // ensure the normalized bin-specific detection probabilities for each 
      // year x species sum to 1 for use in the multinomial likelihood
        ppi_normalized[t,s,b] = ppi[t,s,b] / pPerc[t,s] ;
      } // distance bin loop
      
      for(i in 1:nsites){ // loop over sites
              loglambda[i,t,s] = alpha0[s,t] ;
      } // site loop
    } // year loop
  }
  logpPerc = log(pPerc); // log overall species x year-specific detection prob.

} // species loop

model {
 
  // Prior distributions for community level abundance mean and variance pars.
  mean_alpha0 ~ normal(0, 2) ; // Intercept (varies annually)
  sd_alpha0 ~ normal(0,1); //
  
   // scale parameter dictating the shape of the relationship between 
  // distance-to-observer and detection probability (varies annually)
  mean_beta0 ~ normal(80,20) ;
  sd_beta0 ~ normal(15,20);
  
  // z's are equivalent to the a's in the un-centered parameterization 
  // alpha ~ l + ab
  for(s in 1:nspec){ // species loop
    for(t in 1:nyears){ // year loop
      z_alpha0[s,t] ~ normal(0,1) ;  // intercept on E(N) 
      z_beta0[s,t] ~ normal(0,1) ; // scale parameter on detection model
    } // year loop
  } // species loop 
   
  for(i in 1:nsites){ // site loop
    for(t in 1:nyears){ // year loop
        for(s in 1:nspec){ // species loop
        
          target += poisson_log_lpmf(y[i,t,s] | loglambda[i,t,s] + logpPerc[t,s])
          + multinomial_lpmf(ydb[i,1:ndbins,t,s] | to_vector(ppi_normalized[t,s,1:ndbins]));
          
        } // species loop
    } // year loop
  } // site loop

} // model loop

generated quantities {
  int<lower=0> N[nsites,nyears,nspec]; // Realized N
  matrix[nyears,nspec] Ntot ; // Total N across all sites each year x species
  int counter[nsites,nyears,nspec];

  N = rep_array(0, nsites, nyears, nspec); // Set up N array

  for(i in 1:nsites){
    for(t in 1:nyears){ // year loop
      for(s in 1:nspec){ // species loop

      // remake the log-likelihood b/c you can't transfer model objects to the derived parameters block
      real ll = poisson_log_lpmf(y[i,t,s] | loglambda[i,t,s] + logpPerc[t,s])
      + multinomial_lpmf(ydb[i,1:ndbins,t,s] | to_vector(ppi_normalized[t,s,1:ndbins]));

      // Calculate Abundance - Restrict N to be at least as big as the number of birds observed
      N[i,t,s] = poisson_log_rng(loglambda[i,t,s]);
      counter[i,t,s] = 0;
      // rejection sampling to ensure N >= y
      while (N[i,t,s] < y[i,t,s]) {
        N[i,t,s] = poisson_log_rng(loglambda[i,t,s]);
        counter[i,t,s] += 1;
        if (counter[i,t,s] > 100) break;
      }
      } // species loop
    } // year loop
  }

  for(t in 1:nyears){
    for(s in 1:nspec){
      Ntot[t,s] = sum(N[,t,s]);
    }
  }

}

I don’t know if you’ve seen my occupancy model case study, where I derive the occupancy and verify that the estimates are stable no matter what cap you use (number of unobserved items).

The code’s a little too dense for me to evaluate the abundance calculation, but I don’t see anything that looks like marginalizing out a discrete parameter. Here’s the relevant user’s guide chapter.

Usually if you marginalize out a parameter, you want to continue to compute for it in expectation. Here, you try to sample N in generated quantities. Rejection sampling is unlikely to be the right way to do this.

I would also recommend fewer comments. Especially the ones naming each loop in both the for and the close.

You can also add offset and multiplier specifications to variables like mean_beta0 which you give a normal(80, 20) distribution (declare as real<offset = 80, multiplier = 20> mean_beta0;. This should help speed up both warmup and maybe even sampling.

For efficiency, you probably want to use the y ~ foo(theta) forms or the target += foo_lupmf(y | theta) form to drop normalizing constants. Also, a whole lot of your code could be vectorized to improve speed, such as the z_alpha0 and z_beta0 sampling. as well as the species loop.

The variable counter in the generated quantities can be a local variable without indexes. Just declare real counter = 0 where you define counter[i, t, s] = 0.

Rather than a while loop and break, it’s more conventional to have while (cond1 && counter <= 100) { ... }.

3 Likes

Just a few quick comments. Eq. 5 in Royle’s paper describes a likelihood that analytically marginalizes over N (rather than the brute force integration over potential N up to K), which you use here at:

target += poisson_log_lpmf(y[i,t,s] | loglambda[i,t,s] + logpPerc[t,s])
          + multinomial_lpmf(ydb[i,1:ndbins,t,s] | to_vector(ppi_normalized[t,s,1:ndbins]));

I think the comments/papers in “help with reparameterization” also relate to this idea–that one can marginalize over N to estimate \lambda and \sigma without having to define some K. I’m not sure one can derive a conditional estimate of N_{i} roughly equivalent to sampling the latent parameter without defining some K for convenience.

Just as an example to get a sense of the difference in [edit] the derivation of realized abundance here vs. an empirical Bayes approach with a K, can take a look at the R script at the bottom (just constructs the “posterior” for a single site’s realized N assuming some parameters are estimated exactly). Not really a ton to take from the toy example other than the approaches estimate different things. The EB method is more precise, which might be a virtue for the application here. My intuition is that a point estimate from the truncated Poisson method being used will have some upward bias [edit–maybe most pronounced after summing over sites, years, etc.] because it’s the more diffuse marginal N without the small values, but tough to say what the extent of the issue would be.

Not much to add to Bob’s efficiency comments. Vectorization should speed up any K based operations in the generated quantities, too (or maybe some matrix operation for K and nsites). Maybe could also declare the length/size of K adaptively (locally in the generated quantities) on the basis of mean/max lambda along one or two margins (species and site, or year, or something?) to avoid calculations over possible values of N ranging from 0:1000 if the expected abundance at a site in a specific year is 2, or E(N) across sites in a year ranges from .2 - 3.27, or something.

##edit--this is largely drawn from Kery & Royle's book.
set.seed(125)
lambda=15
sigma=15
interval.width<-10
strip.width<-50
strip.length<-10
nbins<-strip.width%/%interval.width
a<-rep(strip.length*interval.width, nbins)
g<-function(x, sig) (exp(-x^2/(2*sigma^2)))
#N=rpois(1, lambda=lambda) ###here, 13
N=15 #could also try N = 25--gives different results
x<-runif(N, -strip.width, strip.width)
p<-g(x, sig=sigma)
y<-rbinom(N, 1, p) ### 7 seen, if I remember this correctly
x<-x[y==1]
x<-abs(x)
xbin<-x%/% interval.width+1
yt<-rep(0, nbins)
for (i in 1:5){
  yt[i]<-sum(xbin==i)
}

db=c(0, 10, 20, 30, 40, 50)
nbins=5
psi=interval.width/strip.width
dist.breaks<-seq(0, strip.width, by=interval.width)
p=rep(NA, length(dist.breaks)-1)
for (j in 1:length(p)){
  p[j]=integrate(g, dist.breaks[j], dist.breaks[j+1],
                 sig=sigma)$value/(dist.breaks[j+1]-dist.breaks[j])
}
pi<-p*psi
pi0<-1-sum(pi)
cp=c(p, pi0)

###first, the proposed poisson treatment of the problem
N_hat_realized<-rep(NA, 25000)
for (i in 1:25000){
  N_hat_realized[i]=rpois(1, lambda=lambda)
  while (N_hat_realized[i]<sum(yt)) N_hat_realized[i]=rpois(1, lambda=lambda)
}

hist(N_hat_realized)

##the approach used for empirical bayes estimation
K=100 ###upper limit for integration
NPoss<- 0:K
post <- rep(0, K+1)
lik1 <- dpois(NPoss, lambda = lambda)
lik2 <- rep(1, K+1)
  for(k in 1:(K+1)) {
    yi <- yt
    ydot <- NPoss[k] - sum(yi)
    if(ydot<0) {
      lik2[k] <- 0
      next
    }
    yi <- c(yi, ydot)
    lik2[k] <- lik2[k] * dmultinom(yi, size=NPoss[k], prob=cp)
  }
  comb <- lik1*lik2
  comb2 <- comb / sum(comb)

  N_hat_realized_EB<-rep(NA, 25000)
  for (i in 1:25000){
    N_hat_realized_EB[i]=which(rmultinom(1, 1, comb2)==1)-1
  }
  
  hist(N_hat_realized_EB)
2 Likes

Thanks, @jcwi. I should qualify all this by saying I’m by no means an expert on these occupancy models and it’s been a few years since I’ve thought about them.

More precise than what and what’s the notion of precision? Doesn’t empirical Bayes underestimate posterior uncertainty because it fixes hyperparameters at a point estimate?

2 Likes

Whoops, yeah, I should clarify–I guess I mean the marginal estimate of N_{i} or a truncation of this estimate is different from and less precise than an estimate of N_{i} that conditions on the the parameters and also the observations (which is how I think realized N is being thought of here). The script is really just to demonstrate that these quantities are not the same in the laziest way possible–certainly tend to advocate for deriving conditional estimates that account for the parameter uncertainty (i.e., code it up in the generated quantities).

2 Likes

I’m only semi-familiar with distance sampling models, but I’m pretty sure that simulating from the marginal distribution of abundance is not quite what you want.

If I recall the product loglambda[i,t,s] + logpPerc[t,s] that provides the (log) expected value of the Poisson distribution is derived from a Poisson-binomial hierarchy:

[n] = \text{Poisson}(\lambda)
[y | n] = \text{Binomial}(p, n)

where n is abundance, and y is the number of detected individuals, and p is the probability that an individual is detected (pPerc[t,s] in your model). This yields a marginal distribution of (the Poisson part of the likelihood in your model) that no longer depends on the discrete parameter n:

[y] = \text{Poisson}(\lambda p)

The posterior for n would then be:

[n | y] = \dfrac{[y | n] [n] }{\sum_{k=y}^\infty [y | n=k] [n=k]}

(note the summation in the denominator excludes abundance values less than the observed number of individuals).

To work with this in the generated quantities block you’d choose some maximum value of n (let’s call it K):

[n | y] = \dfrac{[y | n] [n] }{\sum_{k=y}^K [y | n=k] [n=k]}

In the past I’ve implemented this kind of thing by first populating a vector of length K containing log probabilities, then using the categorical_logit function to draw a value, e.g., in pseudocode this might look something like this:

generated quantities {
  array[nsites, nyears, nspec] int n; 

  {
    vector[K] logits;
    for (i in 1:nsites) {
      for (t in 1:nyears) {
        for (s in 1:nspec) {
          for (k in y[i, t, s]:K) {
            logits[k] = binomial_lpmf(y[i, t, s] | k, logpPerc[t, s])
              + poisson_lpmf(k | loglambda[i, t, s]);
          }
          n[i, t, s] = y[i, t, s]  + categorical_logit(logits[y[i, t, s]:K]) - 1;
        }
      }
    }
  }
}

That said, it might be easier to reason about this and help if we can 1) strip away some complexity from the model (e.g., single species, single timestep), 2) look at the symbolic representation of the model structure, and 3) have some simulated data to work with.

Edit: one final question - I didn’t follow what pix does - it looks like it’s used in p as a denominator, but then ppi multiplies p by pix, cancelling it out. Does this cancellation imply that pix is not necessary? Seems like you could get an equivalent result by not including pix. I’m out of my depth on this, but curious what the denominator point_area[s]*pix[s,b] represents. Maybe the area contained in distance bin b? Is there a paper that lays this expression out explicitly?

Finally, Ken Kellner has some distance sampling models implemented in Stan in his ubms package, which might be useful: ubms/functions_distsamp.stan at master · kenkellner/ubms · GitHub

2 Likes

Did some digging, and there’s a very elegant solution here. To recap, here’s a reduced model that still leads us to the correct answer (from Kery and Royle, 2015 section 8.5.3).

For sites i=1, ..., n_{\text{site}}

N_i \sim \text{Poisson}(\lambda_i)

y_{i, .} \sim \text{Binomial}(N_i, \sum_k p_{k})

y_{i,1:K} \sim \text{Multinomial}(p_{1:K})

where N_i is the true abundance, y_{i, .} the number of observed individuals, y_{i, k} the number of observed individuals in distance bin k (and y_{i, .} = \sum_k y_{i, k}), and p_k is the probability that an animal occurs and is detected in distance bin k.

As @jmyeiser did we can analytically marginalize N_i out of the model via the Poisson-binomial hierarchy:

y_{i, .} \sim \text{Poisson}(\lambda_i \sum_k p_{k})

y_{i,1:K} \sim \text{Multinomial}(p_{1:K})

Then, using Bayes’ rule, we construct [N_i \mid y_{i, .}, \lambda, p_{1:K}]:

[N_i \mid y_{i, .}, \lambda, p_{1:K}] = \dfrac{[y_{i, .} \mid N_{i}, p_{1:K}] [N_i \mid \lambda_i]}{\sum_{j=y_{i, .}}^\infty [y_{i, .} \mid N_{i}, p_{1:K}] [N_i \mid \lambda_i]}

=\dfrac{\text{Binomial}(y_{i, .} \mid N_i, \sum_{k}p_{k}) \times \text{Poisson}(N_i \mid \lambda_i)}{\text{Poisson}(y_{i, .} \mid \lambda_i \sum_k p_k)}

plug in the expressions for the binomial and Poisson pmfs, do some algebra to find (perhaps suprisingly) that this yields a distribution for the number of animals not detected m_i = N_i - y_{i, .} (Royle and Dorazio (2008), equation 8.3.7):

m_i \sim \text{Poisson}(\lambda_i (1 - \sum_{k} p_k))

You can use this fact to easily get posterior draws of abundance, conditional on the observed number of individuals:

  1. In the generated quantities block, draw the number of “missed” individuals m_i
  2. then compute abundance via N_i = y_{i, .} + m_i.

Here’s some R & Stan code to simulate data, fit a Stan model, and draw samples for abundance using this result: Poisson-binomial multinomial distance sampling model in Stan, with half-normal detection function · GitHub

Refs:

Royle, J. Andrew, and Robert M. Dorazio. Hierarchical modeling and inference in ecology: the analysis of data from populations, metapopulations and communities . Elsevier, 2008.

Kéry, Marc, and J. Andrew Royle. Applied Hierarchical Modeling in Ecology: Analysis of distribution, abundance and species richness in R and BUGS: Volume 1: Prelude and Static Models . Academic Press, 2015.

7 Likes

This an excellent answer.

I have a question about inference though, motivated by a twitter exchange with @betanalpha (https://twitter.com/betanalpha/status/1377792332955803651).

By drawing m_i as a Poisson variate in the generated quantities block are you adding more uncertainty to the estimate than necessary? Would it be better to estimate abundance as N_i = y_i + (\lambda_i(1 - \sum_k p_k)) ?

Thanks! There’s some good discussion of this in section 8.3.2.4 of Royle and Dorazio (2008). If I follow, the key distinction here is whether to report results in terms of expected abundance vs. realized abundance.

Adding the expected number of missed individuals to the number of observed individuals would provide a Bayesian estimate of expected abundance E(N_i \mid y_i), as shown in equation 8.3.8 of Royle and Dorazio:

E(N_i \mid y_i) = y_{i, .} + \frac{1}{R} \sum_{r=1}^R \lambda_i^{(r)} (1 - p_{i, .}^{(r)}),

where R is the number of draws.

If realized (integer) abundance rather than expected abundance is what folks are after, then drawing from the Poisson should do the trick, though as you point out this will include uncertainty/variance from the Poisson distriution.

3 Likes

I wrote about this briefly in the User’s Guide section on posterior predictive inference, but didn’t mention the case where you could improve the estimates by Rao-Blacwellizing. Suppose we want to sample from the posterior predictive distribution p(\tilde{y} \mid y), where our model is a very simple y \sim \textrm{Poisson}(\lambda). Suppose I take draws from the posterior

\lambda^{(1)},\ldots, \lambda^{(S)} \sim p(\lambda \mid y).

Because the expectation of a \textrm{Poisson}(\lambda) variate is \lambda, I can estimate the expectation for \tilde{y} as

\mathbb{E}[\tilde{y} \mid y] = \frac{1}{S} \sum_{s=1}^S \lambda^{(s)}.

The Rao-Blackwell theorem tells us this will be lower variance than taking draws

\tilde{y}^{(s)} \sim \textrm{Poisson}(\lambda^{(m)})

and using

\mathbb{E}[\tilde{y} \mid y] = \frac{1}{S} \sum_{s=1}^S \widetilde{y}^{(s)}.

But, the problem is that you can’t use the draws from p(\lambda \mid y) to estimate the posterior uncertainty in p(\widetilde{y} \mid y). For that, you need to compute posterior intervals over the draws \tilde{y}^{(s)}.

3 Likes