Explanation for more efficient sampling

scorbett · November 27, 2017, 2:58am

I’m trying to model a classification process. We observe that P of the N examples are classified as positive by some decision maker. We then observe that TP of those P positives turn out to be true positives. We model this as two binomial draws

P ~ binomial(N, a);
TP ~ binomial(P, b);

We model a and b in two ways. In the first, a = P(X > t) and b = E[X > t], where X ~ beta(alpha, beta). The positive rate (a) is the fraction of beta samples X above some threshold, and the precision (b) is the mean of those samples:

a = 1 - beta_cdf(t, alpha, beta)
b = (alpha * (1 - beta_cdf(t, alpha + 1, beta))) / ((1 - beta_cdf(x, alpha, beta)) * (alpha + beta))

In the second, the classification process is modeled with LDA, where the classes have equal-variance normal distributions separated by some delta.

tp = phi * (1 - normal_cdf(t, delta, 1));
fp = (1 - phi) * (1 - normal_cdf(t, 0, 1)); 
a = (tp + fp);
b = tp / (tp + fp);

We place reasonable priors on (t, alpha, beta) in the first case and (t, phi, delta) in the second case.

My question is: why is inference more efficient (fewer leapfrog steps per sample and more effective samples per sample) in the second model?

Bob_Carpenter · December 2, 2017, 12:54am

Those are widly different formulas. I can’t say I really understand either model.

Efficiency comes down to both the time to calculate the log density (how well the log density is coded as a program) and the number of times it needs to be evaluated (statistical efficiency). For the former, you can look at tree depth or number of leapfrog steps for the number of times it gets evaluated.

The normal_cdf is probably more efficient than the beta_cdf— if I recall the beta_cdf relies on some nasty internal functions for derivatives that are still being refined.

Statistical efficiency is trickier. You want to take a look at the posterior pairs plots to see if you’re getting problematic posteriors like banana or funnel shapes.

Topic		Replies	Views
Betabinomial and importance sampling Modeling techniques , binomial-response	20	833	July 13, 2021
Efficiency improvement over inverse cdf sampling for simulated multivariate normal draws Modeling performance	3	776	July 2, 2018
Very different sampling with ~beta_binomial than with += beta_binomial_lpmf Developers rstan	5	910	November 30, 2020
How efficient is Stan compared to JAGS? A small test Publicity blog	4	4992	January 7, 2019
Computational Efficiency: advanced techniques? Modeling	1	377	October 4, 2022

Explanation for more efficient sampling

Related topics