# Help with specifying a hierarchical poisson - binomial - gaussian mixture

I’m struggling to specify my model. I’m new to hierarchical models, so may be missing some conceptual clarity too.

I’m modeling the output of an automated classifier for detecting a species in an audio clip as a 2-component gaussian mixture. A component with higher mean for the positives (clips that contain the species of interest), and a component with a lower mean for the negatives (clips that don’t contain the species of interest). I want to estimate the mean number of animals per site. I assume the number of animals ( N_i ) at a site is poisson distributed, each animal has a fixed probability (p) of vocalizing. So the probability of an animal being present in a clip is given by 1-(1-p)^{N_i} . It’s basically a Royle-Nichols model, but with observed data (y) that are ‘scores’ drawn from the gaussian mixture rather than true 1/0 detections.

The model looks something like:

N_i \sim Poisson(\lambda)

n\_positives_i \sim Binomial(n\_clips_i, (1-(1-p)^{N_i})

\theta_i = \frac{n\_positives_i}{n\_clips_i}

y_{ij} \sim \theta_i N(\mu^{1}_i, \sigma^{1}_i) + (1-\theta_i) N(\mu^{2}_i, \sigma^{2}_i)

Here’s my code so far, which is 1. Compiling (but wrong - looking for help in specifying the model) and 2. Very slow

data {
int<lower=1> M;          // number of sites
int<lower=1> J;          // number of scores
array[M,J] real y;       // response variable
int<lower=0> N_max;   // Max possible number of animals at a site to marginalize up to
}

parameters {
real mu1;               // gaussian params. Could use ordered[2] instead.
real mu2;
array[2] real<lower=0> sigma;
real<lower=0,upper=1> p;      // per-capita p of detection
real lambda;                  // average abundance
}

transformed parameters {
array[N_max] real<lower=0, upper=1> p_det;     // probability of detection given n animals
for (n in 1:N_max){
p_det[n] = 1 - (1 - p)^n;
}

model {
// priors
mu1 ~ normal(-1, 2);
mu2 ~ normal(5, 3);
sigma ~ normal(0, 3);
p ~ beta(1, 1);
lambda ~ uniform(0, 15);
for (site in 1:M){
for (j in 1:J){              //scores
for (i in 0:N_max){  // possible number of animals
if (i == 0){.            // All scores are  negatives
target += poisson_lpmf(i | lambda) + normal_lpdf(y[site, j] | mu1, sigma[1]);
} else {
target += poisson_lpmf(i | lambda);
target += log_mix(p_det[i],
normal_lpdf(y[site, j] | mu1, sigma[1]),
normal_lpdf(y[site, j] | mu2, sigma[2]));
}
}
}
}
}



Any help is much appreciated.

hi @louisfh and congrats on your first question here! welcome.

i have so many ideas here, but I’m stuck on the very first step: what is this bimodal gaussian response you are modelling? I mean what are the actual units of these measurements. do you have more than one measurement of this response per clip? how many? it sounds like the response you’re using is itself the output of a classifying algorithm. is that the case, or did I misunderstand?

1 Like

Thanks for the response. I think I’m going to try and break my questions into smaller chunks as this didn’t get much response!

The data y_{ij} are the outputs (on the logit/log-odds scale) of a Neural Network classifier. The classifier is trained to identify the song of one species. So each audio clip gets one score. What we see empirically is that the distribution of scores resembles a gaussian mixture with 2 components, a lower component (negatives - clips that don’t contain the species), and a higher component (positives - clips that do contain the species). I’d like to count the number of clips that contain our species (i.e. those drawn from the gaussian with higher mean), and then try and infer the number of birds at a site.

If you’re interested here’s a previous paper developing the model for fitting a gaussian mixture to classifier score outputs.
https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13905

It sounds to me that what you’re describing is a variant of the N-mixture model or near enough to it, but where the binomial success/failure observation is also latent because all you observe is the score of the audio clip. I have some general advice for you, but it’s difficult to say exactly where to go with this without a reproducible example.

There are a few threads on the forum addressing the N-mixture model and some experts that haunt these forums like @jsocolar that may be able to provide you with some additional insight. See:

Also check out the ubms package which has the Royle-Nichols (2003) model as the occuRN function: Bayesian Models for Data from Unmarked Animals using Stan • ubms

I’m seeing a few things in your code that seem to me to be likely issues for sampling. First is the uniform prior on lambda. Uniform priors can cause issues with sampling when the parameter is not declared bounded. There also doesn’t seem to be a theoretical reason for why lambda should only be as high as 15. Second, is that is looks to be me like the order of the elements in the normal mixture should be reversed.
This line:
if (i == 0){target += poisson_lpmf(i | lambda) + normal_lpdf(y[site, j] | mu1, sigma[1]);}
implies that p_det= 1 when there are zero animals, but shouldn’t p_det = 0? (1 - (1 - p)^0 = 0) I think the statement below that should read:

 target += poisson_lpmf(i | lambda) + log_mix(p_det[i], normal_lpdf(y[site, j] | mu2, sigma[2]), normal_lpdf(y[site, j] | mu1, sigma[1]))

(Aside: the if/else statement is unnecessary, rewrite the loop to go from 1:N_max and have a separate line for the i==0)

Other than that, it appears to me at first glance as though this should work, but the best way to proceed is to simulate the data generating process first and then fit the model to the simulated data to see if you can recover the parameters. Have you done that?

1 Like

Edit: brain-fart here, can ignore most of this message.

If I understand right, this is not an N-mixture model because even if we neglect the Gaussian mixture component and assume that each vocalization can be unambiguously classified, we don’t actually observe the number of individuals vocalizing during a clip.

The problem for this model is that the term 1 - (1 - p)^{N_i} contains a potential non-identifiability between p and N, and the statement of the modeling situation suggests that there is no information available to distinguish these quantities (i.e. this potential non-identifiability is an actual non-identifiability). In general, applying N-mixtures to acoustic monitoring data is challenging (i.e. impossible to my knowledge) when there is no good way to distinguish (and therefore count) the individuals vocalizing.

My advice would be to begin by simulating data assuming that the classifier is perfect, and see if you can get a working model in that setting. Then worry about the uncertainty in the classifier.

You’re correct that it’s not an N-mixture model, but it’s N-mixture adjacent. The underlying model (without the Guassian mixture component) is from Royle and Nichols 2003 paper.

That model is something like:

w_{i,t} \sim Bernoulli(p_i)
p_i = 1 - (1 - r)^{N_i}
where w_{i,t} is 1 if a sample of site i records a presence of the species during temporal replicate t

The model assumes that r, the per individual detection probability, is constant across spatial and temporal replicates. I’m not overly familiar with this method, but it appears to me that under these assumptions N_i should be somewhat identifiable given sufficient numbers of both temporal and spatial replicates. This seems to me to be N-mixture adjacent because ultimately you’ve got \sum_t{w_{i,t}} \sim Binomial(T, p_i) and are relying on the inherent constraints in the variance of the Binomial distribution to ascribe the additional variance to differences in abundance. Maybe I’m missing something though, so correct me if I’m wrong. This is definitely more in your (@jsocolar) wheelhouse then mine.

That said, I tend to agree with you that identifiability is tenuous in this situation. The additional complicating factor of the Gaussian mixture component only clouds the picture, perhaps irretrievably. And so this is a certainly good candidate for a simulation exercise to roughly mark out the terrain where this approach tenable, if it all.

1 Like

Yup, brain-fart on my part. Everyone can somewhat disregard my last message. Under strict assumptions, this is definitely identifiable. Among other assumptions, it only works if there is substantial heterogeneity in abundance across sites, so that (1-r)^N is not approximately linear in r.

2 Likes