Help with specifying a hierarchical poisson - binomial - gaussian mixture

louisfh · August 27, 2024, 9:02pm

I’m struggling to specify my model. I’m new to hierarchical models, so may be missing some conceptual clarity too.

I’m modeling the output of an automated classifier for detecting a species in an audio clip as a 2-component gaussian mixture. A component with higher mean for the positives (clips that contain the species of interest), and a component with a lower mean for the negatives (clips that don’t contain the species of interest). I want to estimate the mean number of animals per site. I assume the number of animals ( N_i ) at a site is poisson distributed, each animal has a fixed probability (p) of vocalizing. So the probability of an animal being present in a clip is given by 1-(1-p)^{N_i} . It’s basically a Royle-Nichols model, but with observed data (y) that are ‘scores’ drawn from the gaussian mixture rather than true 1/0 detections.

The model looks something like:

N_i \sim Poisson(\lambda)

n\_positives_i \sim Binomial(n\_clips_i, (1-(1-p)^{N_i})

\theta_i = \frac{n\_positives_i}{n\_clips_i}

y_{ij} \sim \theta_i N(\mu^{1}_i, \sigma^{1}_i) + (1-\theta_i) N(\mu^{2}_i, \sigma^{2}_i)

Here’s my code so far, which is 1. Compiling (but wrong - looking for help in specifying the model) and 2. Very slow

data {
 int<lower=1> M;          // number of sites
 int<lower=1> J;          // number of scores
 array[M,J] real y;       // response variable
 int<lower=0> N_max;   // Max possible number of animals at a site to marginalize up to
}

parameters {
 real mu1;               // gaussian params. Could use ordered[2] instead.
 real mu2;
 array[2] real<lower=0> sigma;       
 real<lower=0,upper=1> p;      // per-capita p of detection
 real lambda;                  // average abundance
}

transformed parameters {
 array[N_max] real<lower=0, upper=1> p_det;     // probability of detection given n animals
 for (n in 1:N_max){
   p_det[n] = 1 - (1 - p)^n;
 }

model {
 // priors
 mu1 ~ normal(-1, 2);       
 mu2 ~ normal(5, 3);      
 sigma ~ normal(0, 3);    
 p ~ beta(1, 1);
 lambda ~ uniform(0, 15);
  for (site in 1:M){
   for (j in 1:J){              //scores
     for (i in 0:N_max){  // possible number of animals
       if (i == 0){.            // All scores are  negatives
         target += poisson_lpmf(i | lambda) + normal_lpdf(y[site, j] | mu1, sigma[1]);
       } else {
         target += poisson_lpmf(i | lambda);
         target += log_mix(p_det[i],
                 normal_lpdf(y[site, j] | mu1, sigma[1]),
                 normal_lpdf(y[site, j] | mu2, sigma[2]));
               }
               }
   }
 }
}

Any help is much appreciated.

aammd · September 5, 2024, 10:42pm

hi @louisfh and congrats on your first question here! welcome.

i have so many ideas here, but I’m stuck on the very first step: what is this bimodal gaussian response you are modelling? I mean what are the actual units of these measurements. do you have more than one measurement of this response per clip? how many? it sounds like the response you’re using is itself the output of a classifying algorithm. is that the case, or did I misunderstand?

I’m very curious to learn more about this! hope you’re still pursuing an answer on this forum :)

louisfh · September 9, 2024, 8:24pm

Thanks for the response. I think I’m going to try and break my questions into smaller chunks as this didn’t get much response!

The data y_{ij} are the outputs (on the logit/log-odds scale) of a Neural Network classifier. The classifier is trained to identify the song of one species. So each audio clip gets one score. What we see empirically is that the distribution of scores resembles a gaussian mixture with 2 components, a lower component (negatives - clips that don’t contain the species), and a higher component (positives - clips that do contain the species). I’d like to count the number of clips that contain our species (i.e. those drawn from the gaussian with higher mean), and then try and infer the number of birds at a site.

If you’re interested here’s a previous paper developing the model for fitting a gaussian mixture to classifier score outputs.
https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13905

Dalton · September 9, 2024, 9:39pm

It sounds to me that what you’re describing is a variant of the N-mixture model or near enough to it, but where the binomial success/failure observation is also latent because all you observe is the score of the audio clip. I have some general advice for you, but it’s difficult to say exactly where to go with this without a reproducible example.

There are a few threads on the forum addressing the N-mixture model and some experts that haunt these forums like @jsocolar that may be able to provide you with some additional insight. See:

Also check out the ubms package which has the Royle-Nichols (2003) model as the occuRN function: Bayesian Models for Data from Unmarked Animals using Stan • ubms

I’m seeing a few things in your code that seem to me to be likely issues for sampling. First is the uniform prior on lambda. Uniform priors can cause issues with sampling when the parameter is not declared bounded. There also doesn’t seem to be a theoretical reason for why lambda should only be as high as 15. Second, is that is looks to be me like the order of the elements in the normal mixture should be reversed.
This line:
if (i == 0){target += poisson_lpmf(i | lambda) + normal_lpdf(y[site, j] | mu1, sigma[1]);}
implies that p_det= 1 when there are zero animals, but shouldn’t p_det = 0? (1 - (1 - p)^0 = 0) I think the statement below that should read:

target += poisson_lpmf(i | lambda) + log_mix(p_det[i], normal_lpdf(y[site, j] | mu2, sigma[2]), normal_lpdf(y[site, j] | mu1, sigma[1]))

(Aside: the if/else statement is unnecessary, rewrite the loop to go from 1:N_max and have a separate line for the i==0)

Other than that, it appears to me at first glance as though this should work, but the best way to proceed is to simulate the data generating process first and then fit the model to the simulated data to see if you can recover the parameters. Have you done that?

jsocolar · September 10, 2024, 2:08am

Edit: brain-fart here, can ignore most of this message.

If I understand right, this is not an N-mixture model because even if we neglect the Gaussian mixture component and assume that each vocalization can be unambiguously classified, we don’t actually observe the number of individuals vocalizing during a clip.

The problem for this model is that the term 1 - (1 - p)^{N_i} contains a potential non-identifiability between p and N, and the statement of the modeling situation suggests that there is no information available to distinguish these quantities (i.e. this potential non-identifiability is an actual non-identifiability). In general, applying N-mixtures to acoustic monitoring data is challenging (i.e. impossible to my knowledge) when there is no good way to distinguish (and therefore count) the individuals vocalizing.

My advice would be to begin by simulating data assuming that the classifier is perfect, and see if you can get a working model in that setting. Then worry about the uncertainty in the classifier.

Dalton · September 10, 2024, 5:08pm

You’re correct that it’s not an N-mixture model, but it’s N-mixture adjacent. The underlying model (without the Guassian mixture component) is from Royle and Nichols 2003 paper.

That model is something like:

w_{i,t} \sim Bernoulli(p_i)
p_i = 1 - (1 - r)^{N_i}
where w_{i,t} is 1 if a sample of site i records a presence of the species during temporal replicate t

The model assumes that r, the per individual detection probability, is constant across spatial and temporal replicates. I’m not overly familiar with this method, but it appears to me that under these assumptions N_i should be somewhat identifiable given sufficient numbers of both temporal and spatial replicates. This seems to me to be N-mixture adjacent because ultimately you’ve got \sum_t{w_{i,t}} \sim Binomial(T, p_i) and are relying on the inherent constraints in the variance of the Binomial distribution to ascribe the additional variance to differences in abundance. Maybe I’m missing something though, so correct me if I’m wrong. This is definitely more in your (@jsocolar) wheelhouse then mine.

That said, I tend to agree with you that identifiability is tenuous in this situation. The additional complicating factor of the Gaussian mixture component only clouds the picture, perhaps irretrievably. And so this is a certainly good candidate for a simulation exercise to roughly mark out the terrain where this approach tenable, if it all.

jsocolar · September 10, 2024, 6:58pm

Yup, brain-fart on my part. Everyone can somewhat disregard my last message. Under strict assumptions, this is definitely identifiable. Among other assumptions, it only works if there is substantial heterogeneity in abundance across sites, so that (1-r)^N is not approximately linear in r.

simonbrauer · September 12, 2024, 11:54am

Interesting model! If I understand the paper and your implementation of it, the model in the paper does not try to estimate \lambda. That’s your extension. Do I have that right? EDIT: I apologize, I was looking at the wrong article. I see now that Royles and Nichols include the N_i specification.

There are two potential issues as I see them

Identifiability issue
Repeated computations

As previously noted in this thread, I suspect that p and \lambda will be unidentifiable, particularly because you’re using flat priors on both. You might have better luck if you use stronger priors.

You may also be running into a sampling issue because you allow \lambda to be unconstrained but apply a uniform prior. You should instead constrain \lambda to be between 0 and 15 when declaring it.

As a side note (as someone who does not work in this field), it seems like N should influence the classification probabilities and thus \mu_2 and \sigma_2. I imagine if you have ten birds vocalizing in one recording, the probability will be higher than if you have one bird once. So if this is your extension, you may want to add this flexibility, which might help identify \lambda.

Per repeated calculations: given \lambda, \mu, and \sigma are all constant across sites, you can pre-calculate one value of \theta for the whole dataset and the likelihood for each of the data points under each component once. If you REALLY wanted to speed it up, you might allow N_max to adapt to the value of \lambda. For example, if \lambda =1, then you really don’t need to calculate it out to N_{max} = 30 animals.

louisfh:

for (site in 1:M){
   for (j in 1:J){              //scores
     for (i in 0:N_max){  // possible number of animals
       if (i == 0){.            // All scores are  negatives
         target += poisson_lpmf(i | lambda) + normal_lpdf(y[site, j] | mu1, sigma[1]);
       } else {
         target += poisson_lpmf(i | lambda);
         target += log_mix(p_det[i],
                 normal_lpdf(y[site, j] | mu1, sigma[1]),
                 normal_lpdf(y[site, j] | mu2, sigma[2]));
               }
               }
   }
 }

If I’m not mistaken, you should also be treating this is a two-level or cross-classified mixture.

for (site in 1:M){
    for (j in 1:J){              //scores
        vector[N_max + 1] log_lik_i;
        log_lik_i[1] = poisson_lpmf(i | lambda) + normal_lpdf(y[site, j] | mu1, sigma[1])
        for (i in 1:N_max){  // possible number of animals
            log_lik_i[i+1] = poisson_lpmf(i | lambda) +
                             log_mix(p_det[i],
                                     normal_lpdf(y[site, j] | mu1, sigma[1]),
                                     normal_lpdf(y[site, j] | mu2, sigma[2]));
        }

        target += log_sum_exp(log_lik_i);
    }
}

Topic		Replies	Views
Estimating parameters of a negative-binomial mixture model Modeling	35	2542	July 10, 2019
Mixture Bayesian Poisson Regression Model Modeling	45	3903	April 17, 2019
Hierarchical model with a poisson distributed parameter Modeling cmdstanr	12	1102	December 7, 2022
Calculating abundance from marginalized N-mixture model Modeling ecology	9	2320	January 24, 2020
Mixture model: Different perspective to the schools example Modeling	35	1273	April 2, 2020

Help with specifying a hierarchical poisson - binomial - gaussian mixture

Related topics