Calculating abundance from marginalized N-mixture model

I am running an N-mixture model (binomial mixture model) to calculate the abundance of animals from repeated counts at many sites. I have posted about this analysis before (https://discourse.mc-stan.org/t/calculating-the-likelihood-for-loo-in-a-binomial-mixture-model/4787/4) relating to loo, but I have a new question regarding the best approach to calculating abundance (N) from the marginalized form (latent N not sampled as done with BUGS/JAGS/Nimble formulation).

The full code is below but the relevant parts are the likelihood:

  // Likelihood
  for (i in 1:R) {
    vector[K[i] - max_y[i] + 1] lp;
    
    for (j in 1:(K[i] - max_y[i] + 1))
    lp[j] = poisson_log_lpmf(max_y[i] + j - 1 | log_lambda[i])
    + binomial_logit_lpmf(y[i] | max_y[i] + j - 1, logit_p[i]); // vectorized over T
    target += log_sum_exp(lp);
  }
}

where R is the number of sites each visited T times. K is the maximum abundance to iterate through to marginalize N out of the likelihood. max_y is the maximum number of individuals observed at a particular site on any of the T visits. This can also be considered the minimum population size, since we know there are at least that many individuals but likely many more that we missed since they are hard to observe.

This all works great. We are in part interested in the true abundance N (also about the coefficients on the variables affecting log_lambda and logit_p). To calculate N, I was using:

N[i] = poisson_log_rng(log_lambda[i]);

But because it’s a random Poisson draw from lambda it can be a large range above and below that for large lambda. In some cases this means that N[i] is less than max_y[i], the number we observed on a single visit (no chance of double counting with our methods). This is fine to get the expected abundance for any plot at that site and most of the posterior predictive checks look good. There is slight overestimate at low counts near zero and slight underestimate at very high counts when predicting new observations, but generally things look good. However, we know that at that plot, N should be at least the number observed so I changed the code to:

  for (i in 1:R) {
    N_rng[i] = poisson_log_rng(log_lambda[i]);
    
    // Restrict N to be at least as big as the number of animals observed on a single night
    if(N_rng[i] < max_y[i]) 
      N[i] = max_y[i];
    else
      N[i] = N_rng[i];
      
    p[i, 1:T] = inv_logit(logit_p[i, 1:T]);
  }

At some sites this pushes all the lower estimates of N across iterations up to max_y but obviously doesn’t change the rest of the distribution. Maybe this isn’t an issue or maybe it depend about the inference I want to make. My question is if anyone thinks this is a bad idea or has a better approach to estimating N in this model?

Thanks for any thoughts or ideas! The data and everything is at https://github.com/djhocking/GSMNP-Elevation, sorry I don’t have a simple reproducible example at the moment. I am planning to turn this into a tutorial for ecologists eventually with simulated data.

Full Stan model:

// Binomial mixture model with covariates
data {
  int<lower=0> R;       // Number of transects
  int<lower=0> T;       // Number of temporal replications
  int<lower=0> nsites;       // Number of sites
  int<lower=1> sites[R];       // vector of sites
  int<lower=0> y[R, T]; // Counts
  vector[R] elev;          // Covariate
  vector[R] elev2;          // Covariate
  vector[R] litter;          // Covariate
  vector[R] twi;          // Covariate
  vector[R] stream;          // Covariate
  matrix[R, T] RH;      // Covariate
  matrix[R, T] precip;      // Covariate
  matrix[R, T] temp;      // Covariate
  matrix[R, T] temp2;      // Covariate
  vector[R] gcover;      // Covariate
  vector[R] gcover2;      // Covariate
  int<lower=0> K[R];       // Upper bound of population size
}

transformed data {
  int<lower=0> max_y[R];
  int<lower=0> N_ll;
  int tmp[R];
  
  for (i in 1:R) {
    max_y[i] = max(y[i]);
    tmp[i] = K[i] - max_y[i] + 1;
  }
  N_ll = sum(tmp);
}

parameters {
  real alpha0;
  real alpha1;
  real alpha2;
  real alpha3;
  real alpha4;
  real alpha5;
  real alpha6;
  real beta0;
  real beta1;
  real beta2;
  real beta3;
  real beta4;
  real beta5;
  real beta6;
  
  vector[nsites] eps;            // Random site effects
  real<lower=0> sd_eps;
  matrix[R, T] delta;            // Random transect-visit effects
  real<lower=0> sd_p;
}

transformed parameters {
  vector[R] log_lambda; // Log population size
  matrix[R, T] logit_p; // Logit detection probability
  
  for (i in 1:R) {
    log_lambda[i] = alpha0 + alpha1 * elev[i] + alpha2 * elev2[i] + alpha3 * twi[i] + alpha4 * litter[i] + alpha5 * gcover[i] + alpha6 * stream[i] + eps[sites[i]] * sd_eps; // non-centered formulation of random effect (see Monnahan et al. 2017)
    for (t in 1:T) {
      logit_p[i,t] = beta0 + beta1 * temp[i,t] + beta2 * temp2[i,t] + beta3 * precip[i,t] + beta4 * gcover[i] + beta5 * gcover2[i] + beta6 * RH[i,t] + delta[i, t] * sd_p; // non-centered formulation
    }
  }
}

model {
  // Priors
  // Improper flat priors are implicitly used on alpha0, alpha1, beta0 etc. if not specified
  
  alpha0 ~ normal(0, 10);
  alpha1 ~ normal(0, 10);
  alpha2 ~ normal(0, 10);
  alpha3 ~ normal(0, 10);
  alpha4 ~ normal(0, 10);
  alpha5 ~ normal(0, 10);
  alpha6 ~ normal(0, 10);
  
  beta0 ~ normal(0, 2);
  beta1 ~ normal(0, 2);
  beta2 ~ normal(0, 2);
  beta3 ~ normal(0, 2);
  beta4 ~ normal(0, 2);
  beta5 ~ normal(0, 2);
  beta6 ~ normal(0, 2);
  
  eps ~ normal(0, 1);
  sd_eps ~ cauchy(0, 2.5);
  
  for (i in 1:R) {
    for (t in 1:T) {
      delta[i,t] ~ normal(0, 1);
    }
  }
  
  sd_p ~ cauchy(0, 2.5); // even this might be more heavily tailed than we want on p. maybe half normal with sd = 3-5?
  
  // Likelihood
  for (i in 1:R) {
    vector[K[i] - max_y[i] + 1] lp;
    
    for (j in 1:(K[i] - max_y[i] + 1))
    lp[j] = poisson_log_lpmf(max_y[i] + j - 1 | log_lambda[i])
    + binomial_logit_lpmf(y[i] | max_y[i] + j - 1, logit_p[i]); // vectorized over T
    target += log_sum_exp(lp);
  }
}

generated quantities {
  int<lower=0> N[R];                  // Abundance (must be at least max_y)
  int<lower=0> N_rng[R];              // Abundance proposed by random draw from lambda
  int totalN;
  vector[R] log_lik;
  real mean_abundance;
  real mean_detection;
  vector[R] mean_p;
  real fit = 0;
  real fit_new = 0;
  matrix[R, T] p; 
  
  matrix[R, T] eval;         // Expected values
  int y_new[R, T];
  int y_new_sum[R];
  matrix[R, T] E;
  matrix[R, T] E_new;
  
  for (i in 1:R) {
    vector[K[i] - max_y[i] + 1] ll;
    
    for (j in 1:(K[i] - max_y[i] + 1)) {
      ll[j] = poisson_log_lpmf(max_y[i] + j - 1 | log_lambda[i]) // remake lp because can't use from model statement in generated quantities
      + binomial_logit_lpmf(y[i] | max_y[i] + j - 1, logit_p[i]);
    }
    log_lik[i] = log_sum_exp(ll); // for use in loo and multimodel comparison
  }
  
  for (i in 1:R) {
    N_rng[i] = poisson_log_rng(log_lambda[i]);
    
    // Restrict N to be at least as big as the number of animals observed on a single night
    if(N_rng[i] < max_y[i]) 
      N[i] = max_y[i];
    else
      N[i] = N_rng[i];
      
    p[i, 1:T] = inv_logit(logit_p[i, 1:T]);
  }
  
  // Bayesian p-value fit - Diff than JAGS but all parameters about the same. Not a great test anyway. Problem with RNG for poisson rather than sampling N directly. Makes a big diff with the chi-sq and other Bayesian P-value metrics. But other posterior predictive checks look great.
  
  // Initialize E and E_new
  for (i in 1:1) {
    for(j in 1:T) {
      E[i, j] = 0;
      E_new[i, j] = 0;
    }
  }
  for (i in 2:R) {
    E[i] = E[i - 1];
    E_new[i] = E_new[i - 1];
  }
  
  for (i in 1:R) {
    mean_p[i] = mean(p[i]);
    
    for (j in 1:T) {
      // Assess model fit using Chi-squared discrepancy
      // Compute fit statistic E for observed data
      eval[i, j] = p[i, j] * N[i];
      E[i, j] = square(y[i, j] - eval[i, j]) / (eval[i, j] + 0.5);
      // Generate replicate data and
      // Compute fit statistic E_new for replicate data
      y_new[i, j] = binomial_rng(N[i], p[i, j]);
      E_new[i, j] = square(y_new[i, j] - eval[i, j]) / (eval[i, j] + 0.5);
    }
    y_new_sum[i] = sum(y_new[i]);
  }
  
  totalN = sum(N);  // Total pop. size across all sites
  for (i in 1:R) {
    fit = fit + sum(E[i]);
    fit_new = fit_new + sum(E_new[i]);
  }
  mean_abundance = exp(alpha0);
  mean_detection = 1 / (1 + exp(-1 * beta0));
}

2 Likes

I am not a specialist. Take my points as some starting of points what you could do.

  • I have used such an approach before for an industry/applied audience and it made so much sense to them. Statisticians were less enthusiastic.
  • An alternative approach would be to use a rejection sampling approach to generate N_rng. Something like code below. This approach is more akin to using a truncated distribution instead of the censoring in your approach. My approach will affect the rest of the distribution. I don’t know whether that makes sense for your question though. Sidenote: the stan code can be improved to avoid getting into an infinite while loop.
for (i in 1:R) {
  N_rng[i] = poisson_log_rng(log_lambda[i]);
  while (N_rng[i] < max_y[i])
      N_rng[i] = poisson_log_rng(log_lambda[i]); 
}
  • Another option would be more complicated where you use the full structure of the model. You would have to account for the mixture part of the likelihood. I would have been staring at this for a while and I can’t exactly work it out. I think the rough version is that you generate p first, than account for the observed y and max_y to reweight the poisson simulation. Maybe someone with a better understanding of these models can help you with this.
ll[j] = poisson_log_lpmf(max_y[i] + j - 1 | log_lambda[i]) // remake lp because can't use from model statement in generated quantities
      + binomial_logit_lpmf(y[i] | max_y[i] + j - 1, logit_p[i]);
2 Likes

Thanks so much! I always neglect the use of while control statements. I must have had a bad experience with infinite loops when I first started programming. This makes a lot of sense and makes for nice truncated distributions as you note. I’ll have to think about your third suggestion some more. It’ll be a good exercise when I have some time.

I’m going to leave this open a bit longer before accepting just to see if anyone else has any thoughts on it.

My new code is below. I only set the counter to 100 because R is large and the model is slow already. It seems unlikely to require more than 100 draws from a Poisson to get a value larger than max_y since in most cases max_y is much smaller than \lambda.

for (i in 1:R) {
      N[i] = poisson_log_rng(log_lambda[i]);
      counter[i] = 0;
      while (N[i] < max_y[i]) {
        N[i] = poisson_log_rng(log_lambda[i]);
        counter[i] += 1;
        if (counter[i] > 100) break;
      }
}

Are you looking to recover some discrete parameters that you’ve previously marginalised out? If so, have you seen this article and does it help in your case?

“Recovering posterior mixture proportions”

1 Like

See https://github.com/betanalpha/knitr_case_studies/blob/master/principled_bayesian_workflow/sample_joint_ensemble4.stan for an example of how to efficiently sample from a discrete truncated probability mass function without infinite loops.

2 Likes

Thanks, I’m trying to wrap my head around this in how to apply it to my problem. This is what I’ve got so far:

// int<lower=0> N[R];
int N[R] = rep_array(0, R); // It didn't like it if I didn't initiate it.

  for (i in 1:R) {
    real sum_p = 0;
    real u = uniform_rng(0, 1); //
   
      lambda[i] = exp(log_lambda[i]); // because no poisson_log_lcdf
      
      for (k in max_y[i]:K[i]) {         //
        sum_p = sum_p + exp(poisson_lpmf(k | lambda[i]) - poisson_lcdf(K[i] | lambda[i]));
        if (sum_p >= u) {
          N[i] = k;
          break;
        }
      }
  }

It seems to push the weight higher but doesn’t keep it all above max_y. I think there’s still something I’m missing in your approach.

Warning: Please double check everything I say because on the one hand I feel like I am on to something. On the other hand, I have no special skills in any of this type of modeling and I am pretty sure I messed up somewhere.

I am not really sure how this code can generate N < max_y if N = k and k is between max_y and K. Are you sure that you get N < max_y.

I think the code that you have is accounting for the fact that the poisson is trunctated at K and you cannot have any probability above K (- poisson_lcdf(K[i])). You have to make the same adjustment for the fact that the poisson is also truncated below at max_iter.

However - and this is the most speculative part -, the loglikelihood that is in the model is not only truncated. It’s also perturbed by the binomial. So to take that into account I think this is how you generate N equivalent to your model. The code is in R because it’s easier for me to proto type in R.

For each observation and mcmc draw, start with generating p according to the model. max_y, y, and K are data. lambda is an estimated parameter. So you have a number for each. The first likelihood is the binomial, the second is the poisson in your model (I hope I got the relation between y and max_y right here). Multiplying the likelihoods (or summing the log_likelihoods as in the stan model) gives you the perturbed poisson. Than standardize the likelihood to get the probabilities for each k (This is the step I am most unsure about). Sample N from the probabilities of all ks. If you don’t really need exact simulated N but you are more interested in the distribution of the N for each observation I think you could also calculate the expected value for N (meanN). Remember that is the expected value for each observation and each mcmc draw.

p <- .75
max_y <- 50
lambda <- 40
y <- 40
K <- 10
ks <- max_y + (0:K)

lik_binom <- dbinom(x = y, size = ks, p = p)
lik_poisson <- dpois(x = ks, lambda = lambda)
lik = lik_binom * lik_poisson
probs <- lik/sum(lik)
N <- sample(x = ks, size = 1, prob = probs)
meanN = sum(ks * probs)

@Daniel_Hocking does eq. 4 of Royle’s original Biometrics paper on N-mixture models provide a solution? Looks like he outlines how to compute a posterior for abundance, [N_i = k \mid y, \theta]:

[N = k \mid n_1, ..., n_T, \theta, p] = \dfrac{\text{Pr}(n_1, ..., n_T \mid N=k, p) \times \text{Pr}(N=k \mid \theta) }{\sum_{k=0}^\infty \text{Pr}(n_1, ..., n_T \mid N=k, p) \times \text{Pr}(N=k \mid \theta) }.

You could implement this in the generated quantities block (generating an R by K matrix, where the entry is the probability that N_r = k), and sample from the posterior with an inverse probability integral transform. This will respect the constraint that N must be greater than the maximum count, because there will be zero probability mass for values where this condition is not satisfied (i.e., in the paper’s notation \text{Pr}(n_1, ..., n_T \mid N=k, p) = 0 if \text{max}(n_1, ..., n_T) > k).

Edit: This is essentially the same idea that @stijn ust outlined above as well :)

2 Likes

Not going to lie, I am pretty happy that I was on the right track. Also hooray for the Stan Forums, we found the theory and some code!

2 Likes

Given the other responses it sounds like sampling from a truncated Poisson might be insufficient, but I’ll explain a bit more for the truncated Poisson case just in case.

Given truncations you have to correct for the missing probability outside of the truncations. If you have both lower and upper truncations then you can just compute the probability mass at each point within the truncations and sample from a categorical_rng. This also works if you have an upper bound that’s not too big because the discrete states are always positive so 0 acts as an implicit lower bound.

If you have only a lower bound, however, then this won’t work because you have an infinite number of states left. Even when you have a finite number of states you might have too many for the categorical approach to be practical.

The code I shared tries to sample the states sequentially. You compute the probability of the first state and the probability of all other states and simulate a probability. On success you take the first state and stop, and on failure you reject the first state and move onto the next state to see if the combined probabilities are greater than the sampled probability. You then repeat this process recursively until the summed probabilities are greater than the sampled probability.

For example let’s say that we have a lower bound, K. The first viable state is k = K, with probability mass p1 = exp(poisson_lpdf(K | lambda)) / poisson_ccdf(K | lambda), where the complementary cumulative distribution function accounts for the truncated normalization. If we sample u then we checked whether u <= p1; if it is then we continue and if it’s not then we stop taking k = K as our sampled state. If we continue then we compute p_sum = p1 + p2 and check u <= p_sum. At the final state p_sum has to be equal to one by construction, so the final state be taken if reached.

If you have both a lower bound and an upper bound then the truncated normalization has to account for both, p1 = exp(poisson_lpdf(K_lower | lambda)) / (poisson_ccdf(K_lower | lambda) - poisson_cdf(K_upper | lambda)) .

I’m guessing that you were missing the proper normalization so the sequential sampling was going past the highest value and then terminating the loop without changing the default value.

3 Likes