Trouble formulating an equivalence hypothesis test in Stan

I am having an issue trying to formulate a hypothesis. The question feels simple: I am trying to find the probability that two populations have the same parameters(\boldsymbol{\theta}_A = \boldsymbol{\theta}_B). For context, I have two observations from these populations, \text{obs}_A and \text{obs}_B, that are sampled from a negative binomial distribution parameterized by:

\text{obs}_X \sim \text{NegBinom}(\mu_X, 10)

Where the subscript X distinguishes observations, \mu_X is the mean of the distribution that \text{obs}_X was sampled from, and 10 is the inverse overdispersion factor for the negative binomial.

Both observations have a sample size of 1(meaning a single count), and so the hypothesis I’m trying to answer is whether \mu_A = \mu_B(the null) or \mu_A \neq \mu_B(the alternative).

I saw that Stan can be used to calculate the Bayes Factor(Bayesian One-Sample T-Test (Stan)). If I’m understanding correctly, it uses lp__ to approximate the marginal (log) likelihood, which is then compared between your two models of interest to find the BF.

Writing the null and alternative models in Stan I get:

// H0: Assuming the two distributions have the same parameters
data {
  array[2] int y; // observations
}
parameters {
  real<lower=0, upper=1e6> mu; // mean of negative binomial
}
model {
  target += 2 * uniform_lpdf(mu | 0, 1e6); // 2 * because the data was genarated from TWO DISTRIBUTIONS, not one
  target += neg_binomial_2_lpmf(y | mu, 10);  // likelihood
}

// H1: Assuming the two distributions have variable parameters
data {
  array[2] int y; // observations
}
parameters {
  vector<lower=0, upper=1e6>[2] mu; // means of negative binomial
}
model {
  for (i in 1:2) {
    target += uniform_lpdf(mu[i] | 0, 1e6); // prior
    target += neg_binomial_2_lpmf(y[i] | mu[i], 10);  // likelihood 
  }
}

However, they’re not working exactly as I would like. Intuitively, if given the counts A = 0 and B = 0, there shouldn’t be much evidence for the alternative. And this is where the model works okay, the BF is ~0.4 when given [0, 0]. Where the model starts deviating away from intuition is when the counts are larger. If you observe A = 20 and B = 20, the evidence for the null should decrease because populations with large means inherently have more variance, so having two large counts be the same does not provide the same amount of evidence for the null as having two small counts be the same. However, the evidence for the alternative shouldn’t increase, and this is where I’m running into an issue.

When you give the same count for both A and B, the BF in favour of the alternative increases with the value of the count. Meaning if you observe [100, 100], the BF is much larger than 1(it’s ~65) despite looking at it and going, “but you have no evidence for the alternative, they’re the same.”

What I would like to happen is that if the counts are really similar, the BF should be below 1. But if the similar counts are large, for example [100, 100], the BF should approach 1 but not really pass it. Because there should be no reason to favour the alternative when the counts are extremely similar, and there also shouldn’t be any reason to favour the null when the counts are extremely high.

I am not sure how to rewrite the models, I am stuck on how I would solve this problem.

Also, if anyone is confused as to why I’m trying to do anything with a sample size of 1 for each population, the reason is because my data looks like this:

\text{obs}_X = \begin{bmatrix} m_{X[1]} \\ m_{X[2]} \\ \vdots \\ m_{X[n]} \end{bmatrix} \sim \begin{bmatrix} \text{NegBinom}(\mu_{X[1]}, 10) \\ \text{NegBinom}(\mu_{X[2]}, 10) \\ \vdots \\ \text{NegBinom}(\mu_{X[n]}, 10) \end{bmatrix} = \text{ind}_X

Where each observation has measurements for n features, and the error in this measurement is given by the negative binomial on the right.

The end goal is to find the probability that two observations are sampled from equivalent individuals. So combining information across all features should (hopefully) give enough power to find this. I began with the simple case of one feature because originally I had no idea where to start.

If anyone has any advice I would be extremely thankful

1 Like

So, I have never worked with Bayes factors, so apologies if I am way off here. I may not understand what you mean by “evidence”. But I don’t quite understand your logic.

So if I am following, then there are only 2 states of the world here, either \mu_A = \mu_B or \mu_A \neq \mu_B. No other options.

If the evidence for the null decreases, then shouldn’t this imply an increase in evidence for the alternative, if the only two possible states of the world are null or alternative? What would it mean if the evidence for the null decreased but the evidence for the alternative didn’t change?

It seems like the fact that the variance is larger as you stated, should imply increasing evidence for the alternative when comparing single observations. Larger variance would imply a wider probability density for means generating any particular value like 100, right? This would seem to imply more probability that the means that generated the 2 values of 100 are different. It would be quite a chance that they were the same.

If the evidence for the null decreases, then shouldn’t this imply an increase in evidence for the alternative, if the only two possible states of the world are null or alternative? What would it mean if the evidence for the null decreased but the evidence for the alternative didn’t change?

The issue is the relative evidence(compared to the null) for the alternative shouldn’t change, at least how I am understanding it. The larger the mean of the negative binomial, the flatter the pdf for the counts. Intuitively, if the two observed counts are extremely similar or exactly the same, regardless of how large the counts are, the null should be favoured. Only when the counts start deviating from each other should the alternative be favoured. If both counts are large and have a certain difference(e.g, 95 and 105, difference of 10), then the support for the alternative is smaller relative to if both counts are small with the same difference(e.g, 1 and 11, difference of 10).

It seems like the fact that the variance is larger as you stated, should imply increasing evidence for the alternative when comparing single observations. Larger variance would imply a wider probability density for means generating any particular value like 100, right? This would seem to imply more probability that the means that generated the 2 values of 100 are different. It would be quite a chance that they were the same.

The issue I think is that the permission of having different means should be penalized in such a way that two different means should be unlikely compared to having the same means.

1 Like

Yes. The prior can make a big difference for the Bayes factor, more so than for the posterior mean / median. You choose a flat prior [0, 1e6] for the mean parameters, mu1 and mu2. Effectively, this means you put a flat prior [-1e6, 1e6] for the difference mu1 - mu2.

It may help to reparametrize in terms of a mean mu = mu1 (for the first observation) and a difference delta = mu2 - mu1. I’ll also use neg_binomial_2_log_lpmf so there is no need to put a positivity constraint on mu or mu + delta.

Then I put normal(0, 0.1) prior on the difference in means delta and I follow the Bayes factor tutorial to get:

data           Bayes factor
y1 = y2 = 0    BF = 0.98343
y1 = y2 = 20   BF = 0.98521
y1 = y2 = 100  BF = 0.97817

And what about when the prior for delta is uniform on [-100, 100]?

data           Bayes factor
y1 = y2 = 0    BF = 0.75959
y1 = y2 = 20   BF = 0.00696
y1 = y2 = 100  BF = 0.00598

Here is the stan code for the alternative hypothesis that I used:

data {
  int y1;
  int y2;
}
parameters {
  real mu;    // mean of one negative binomial
  real delta; // difference in means
}
model {
  target += uniform_lpdf(mu | -100, 100);
  // target += normal_lpdf(delta | 0, 0.1);
  // or
  target += uniform_lpdf(delta | -100, 100);

  target += neg_binomial_2_log_lpmf(y1 | mu, 10);
  target += neg_binomial_2_log_lpmf(y2 | mu + delta, 10);
}
1 Like

In the end I realized framing my question as a hypothesis test wasn’t giving me the functionality I needed. Instead I estimated each observation’s mu parameter and used the contrast between mu[1] and mu[2] to determine whether two observations are not equivalent.

Because the sample size is 1 you can immediately guess the posteriors are always very flat, but in scenarios where the counts are very different(e.g, 1 and 100), the estimates for mu[1] and mu[2] and their difference is confidently non-zero, which is in line with my intuition and gives me the functionality I wanted.

Thank you to all who helped!

1 Like