I’m trying to model a classification process. We observe that P of the N examples are classified as positive by some decision maker. We then observe that TP of those P positives turn out to be true positives. We model this as two binomial draws

```
P ~ binomial(N, a);
TP ~ binomial(P, b);
```

We model a and b in two ways. In the first, `a = P(X > t)`

and `b = E[X > t]`

, where `X ~ beta(alpha, beta)`

. The positive rate (a) is the fraction of beta samples X above some threshold, and the precision (b) is the mean of those samples:

```
a = 1 - beta_cdf(t, alpha, beta)
b = (alpha * (1 - beta_cdf(t, alpha + 1, beta))) / ((1 - beta_cdf(x, alpha, beta)) * (alpha + beta))
```

In the second, the classification process is modeled with LDA, where the classes have equal-variance normal distributions separated by some `delta`

.

```
tp = phi * (1 - normal_cdf(t, delta, 1));
fp = (1 - phi) * (1 - normal_cdf(t, 0, 1));
a = (tp + fp);
b = tp / (tp + fp);
```

We place reasonable priors on (t, alpha, beta) in the first case and (t, phi, delta) in the second case.

My question is: **why is inference more efficient (fewer leapfrog steps per sample and more effective samples per sample) in the second model?**