Issues with hierarchical model for species detection

max · July 21, 2019, 1:15pm

I am fitting a model to binary species detection/non-detection data, obtained through replicated visits to different sites during multiple years. The probability of detecting the species (i.e. at least one individual) is assumed to vary with the population size in the corresponding site. As in Royle & Nichols 2003, this relationship is exploited to estimate the (latent) population size. However, as this would require marginalization over all possible population size values (with a potentially very high upper bound in my case), a gamma or lognormally distributed positive continuous population size indicator is used instead. This yields the following model, strongly stripped down from features irrelevant to this post (e.g. spatiotemporal structure and covariates):

N_{i,j} \sim \operatorname{lognormal}(\mu, \sigma)
y_v \sim \operatorname{Bernoulli}(1-(1-p_{v})^{N_{i,j}})
\operatorname{logit}(p_{v})=\beta_0+\beta_1*\operatorname{effort}_v

with N_{i,j} the population size indicator of site i in year j, y_v the binary outcome (detected or not) during visit v, p_{v} the probability of detecting a single individual during visit v and \operatorname{effort}_v the search effort during the visit v.

My current Stan model is the following, with a small subset of the data provided in this dump: Detectiondata.R (251.1 KB) .

data {
  int<lower=1> nyear;
  int<lower=1> nvisit;
  int<lower=1> nsite;
  int<lower=0,upper=1> y[nvisit];
  real<lower=0> effort[nvisit];
  int site[nvisit];
  int year[nvisit];
}
    
parameters {
  real<lower=0> N[nsite,nyear];
  real mu;
  real<lower=0> sigma;
  real intercept;
  real slope;
}
    
transformed parameters {
  real<lower=0,upper=1> p[nvisit];
  for (v in 1:nvisit) {
    p[v] = inv_logit(intercept + slope*effort[v]);
  }
}
    
model {
  mu ~ normal(0, 5);
  sigma ~ cauchy(0,5);
  intercept ~ normal(0, 5);
  slope ~ normal(0, 5);
  
  for (i in 1:nsite) {
    for (j in 1:nyear) {
      N[i,j] ~ lognormal(mu, sigma);
    }
  }
  for (v in 1:nvisit) {
    y[v] ~ bernoulli(1-pow(1-p[v], N[site[v],year[v]]));
  }
}

Running this model, I’m (exclusively) getting divergent transitions and absurd results (e.g. many population size indicators estimated to be > 1e20). The corresponding JAGS model seems to do just fine, though I am aware this does by no means imply that the JAGS results can be trusted. I have strong suspicions my current Stan implementation is inefficient or plainly wrong. Any advise or directions would be greatly appreciated!

maxbiostat · July 29, 2019, 12:28pm

I can’t test it now, but could you perhaps try coding your model as

log_p[v] = intercept + slope*effort[v];
y[v] ~ bernoulli_log(log1m_exp(N[site[v],year[v]]) * log1m_exp(log_p[v]) );

So as to increase numerical stability?

Also, on a somewhat unrelated note, do the authors of that paper (or you) have a good justification for using a log-normal for the underlying counts? I ask because if you replaced the log-normal with, say, a negative binomial, you could derive some results analytically.

Topic		Replies	Views
Royle and Nichols Modeling ecology	3	864	May 28, 2020
Convergence Troubles and Pathologies in an already Reparameterized Model Modeling fitting-issues , specification , performance , cmdstanr , hierarchical-model	5	553	July 27, 2023
Joint Species Distribution Model Performance Modeling ecology	3	1115	December 15, 2020
Multi-species occupancy - no detection history Modeling techniques , fitting-issues , performance	11	1144	May 5, 2022
Suggestions for improving model of bird monitoring data Modeling performance	11	952	November 17, 2020

Issues with hierarchical model for species detection

Related topics