I am having an issue trying to formulate a hypothesis. The question feels simple: I am trying to find the probability that two populations have the same parameters(\boldsymbol{\theta}_A = \boldsymbol{\theta}_B). For context, I have two observations from these populations, \text{obs}_A and \text{obs}_B, that are sampled from a negative binomial distribution parameterized by:

Where the subscript X distinguishes observations, \mu_X is the mean of the distribution that \text{obs}_X was sampled from, and 10 is the inverse overdispersion factor for the negative binomial.

Both observations have a sample size of 1(meaning a single count), and so the hypothesis Iâ€™m trying to answer is whether \mu_A = \mu_B(the null) or \mu_A \neq \mu_B(the alternative).

I saw that Stan can be used to calculate the Bayes Factor(Bayesian One-Sample T-Test (Stan)). If Iâ€™m understanding correctly, it uses `lp__`

to approximate the marginal (log) likelihood, which is then compared between your two models of interest to find the BF.

Writing the null and alternative models in Stan I get:

```
// H0: Assuming the two distributions have the same parameters
data {
array[2] int y; // observations
}
parameters {
real<lower=0, upper=1e6> mu; // mean of negative binomial
}
model {
target += 2 * uniform_lpdf(mu | 0, 1e6); // 2 * because the data was genarated from TWO DISTRIBUTIONS, not one
target += neg_binomial_2_lpmf(y | mu, 10); // likelihood
}
```

```
// H1: Assuming the two distributions have variable parameters
data {
array[2] int y; // observations
}
parameters {
vector<lower=0, upper=1e6>[2] mu; // means of negative binomial
}
model {
for (i in 1:2) {
target += uniform_lpdf(mu[i] | 0, 1e6); // prior
target += neg_binomial_2_lpmf(y[i] | mu[i], 10); // likelihood
}
}
```

However, theyâ€™re not working *exactly* as I would like. Intuitively, if given the counts A = 0 and B = 0, there shouldnâ€™t be much evidence for the alternative. And this is where the model works okay, the BF is ~0.4 when given `[0, 0]`

. Where the model starts deviating away from intuition is when the counts are larger. If you observe A = 20 and B = 20, the evidence for the null should decrease because populations with large means inherently have more variance, so having two large counts be the same does not provide the same amount of evidence for the null as having two small counts be the same. However, the evidence for the alternative shouldnâ€™t increase, and this is where Iâ€™m running into an issue.

When you give the same count for both A and B, the BF in favour of the alternative increases with the value of the count. Meaning if you observe `[100, 100]`

, the BF is much larger than 1(itâ€™s ~65) despite looking at it and going, â€śbut you have no evidence for the alternative, theyâ€™re the same.â€ť

What I would like to happen is that if the counts are really similar, the BF should be *below* 1. But if the similar counts are large, for example `[100, 100]`

, the BF should *approach* 1 but not really pass it. Because there should be no reason to favour the alternative when the counts are extremely similar, and there also shouldnâ€™t be any reason to favour the null when the counts are extremely high.

I am not sure how to rewrite the models, I am stuck on how I would solve this problem.

Also, if anyone is confused as to why Iâ€™m trying to do anything with a sample size of 1 for each population, the reason is because my data looks like this:

Where each observation has measurements for n features, and the error in this measurement is given by the negative binomial on the right.

The end goal is to find the probability that two observations are sampled from equivalent individuals. So combining information across all features should (hopefully) give enough power to find this. I began with the simple case of one feature because originally I had no idea where to start.

If anyone has any advice I would be extremely thankful