I’m trying to model a classification process. We observe that P of the N examples are classified as positive by some decision maker. We then observe that TP of those P positives turn out to be true positives. We model this as two binomial draws
P ~ binomial(N, a);
TP ~ binomial(P, b);
We model a and b in two ways. In the first, a = P(X > t)
and b = E[X > t]
, where X ~ beta(alpha, beta)
. The positive rate (a) is the fraction of beta samples X above some threshold, and the precision (b) is the mean of those samples:
a = 1 - beta_cdf(t, alpha, beta)
b = (alpha * (1 - beta_cdf(t, alpha + 1, beta))) / ((1 - beta_cdf(x, alpha, beta)) * (alpha + beta))
In the second, the classification process is modeled with LDA, where the classes have equal-variance normal distributions separated by some delta
.
tp = phi * (1 - normal_cdf(t, delta, 1));
fp = (1 - phi) * (1 - normal_cdf(t, 0, 1));
a = (tp + fp);
b = tp / (tp + fp);
We place reasonable priors on (t, alpha, beta) in the first case and (t, phi, delta) in the second case.
My question is: why is inference more efficient (fewer leapfrog steps per sample and more effective samples per sample) in the second model?