MLE lies outside posterior draws

I am trying to estimate an “overlap” probability from two distributions. Basically, I count the number of observations less than or equal to some cutoff. I treat the counts as binomial, but other discrete distributions are possible.

I’ve computed MLEs (Maximum Likelihood Estimates) and now want to see how Stan fares. However, the MLE is much much larger than any of the posterior draws. My problem appears quite similar to:

For instance, in my case (K = 1) the MLE for the count (see code below) is y_upr = 654, but the posterior only ranges from 0-13. Further (below), N is 25 and M is 269692. intra ranges from 0-0.01667 and inter ranges from 0-0.17667. A posterior predictive check check yields a mean of only 0.975 counts.

Here is a plot:

I can’t really inject prior information into the model, because I have none. I tried to model counts as both a Poisson and a negative binomial, since these have unbounded support. However, no luck.

Below is my Stan program:

data {
  int<lower = 1> K; // number of species in genus
  int<lower = 1> N[K]; // number of intraspecific (within-species) genetic distances for each species
  int<lower = 1> M; // number of interspecific (among-species) genetic distances for all species
  array[K] row_vector<lower = 0, upper = 1>[max(N)] intra; // intraspecific genetic distances for each species
  vector<lower = 0, upper = 1>[M] inter; // interspecific genetic distances for all species
}

transformed data {
  int<lower = 0, upper = M> y_upr[K]; // count of interspecific distances for all species less than or equal to max_intra for each species
  real<lower=0, upper=1> max_intra[K]; // maximum intraspecific distance for each species

  for (k in 1:K) {
    max_intra[k] = max(intra[k, 1:N[k]]);

    y_upr[k] = 0;

   y_upr[k] += (inter[k] <= max_intra[k]); // count interspecific distances for all species less than or equal to max_intra for each species
  }
}

parameters {
  real<lower = 0, upper = 1> p_upr[K]; // parameter representing the proportional overlap/separation between interspecific distances for all species and intraspecific distances for each species
}

model {
  for (k in 1:K) {
    // prior
    p_upr[k] ~ uniform(0, 1);

    // likelihood
    y_upr[k] ~ binomial(M, p_upr[k]); // likelihood for interspecific distances equalling or falling below max_intra
  }
}

generated quantities {
  // Posterior Predictive Checks
  int post_y_upr[K]; 

  for (k in 1:K) {
    post_y_upr[k] = binomial_rng(M, p_upr[k]);
  }
}

Any other ideas here? Happy to clarify if needed.

For k==1 (the only value of k)
y_upr will be either zero or one. Zero if inter[1] > max_intra[1] and one otherwise. In your text, you indicate that y_upr is 654.

As an aside, I’m confused by how you can have a non-empty vector of interspecific genetic distances when K is one, but this could well be my confusion and not a problem with the model as written.

@jsocolar D’oh!

I initially wrote

    for (m in 1:M) {
      y_upr[k] += (inter[k] <= max_intra[k]); // count interspecific distances for all species less than or equal to max_intra for each species
}

but then got rid of the for loop in an edit of my post since there was no dependence on M.

The corrected version is:

for (m in 1:M) {
  y_upr[k] += (inter[m] <= max_intra[k]); // count interspecific distances for all species less than or equal to max_intra for each species
}

Now all is fine - the model is correctly specified and also addresses your confusion.

Is your original question also solved, or is there still an issue?

@jsocolar Accepted your answer. Thanks.