Hi all,
I am modeling binary classifications from a machine-learning classifier dependent on a few discrete covariates (for context, see below). I also have data from a classifier validation in the form of a confusion table.
The basic idea came from posts of @mitzimorris and @Bob_Carpenter in a thread about an unrelated topic. As in their example, I want to implement the model using brms
for reasons of convenience, but the question is not about brms
. I am open to implementing the model directly in Stan if this makes it easier.
However, instead of providing sensitivity and specificity as fixed data values, I want to provide beta priors with counts from the confusion table as shape parameters. Changing @mitzimorris’ code for this purpose is straightforward. The important part is
# define a *vectorized* custom family (no loop over observations)
binomial_sens_spec_vec <- custom_family(
"binomial_sens_spec", dpars = c("mu", "sens", "spec"),
links = c("logit", "identity", "identity"),
lb = c(NA, 0, 0), ub = c(NA, 1, 1),
type="int", vars= c("trials"), loop = FALSE
)
# define the corresponding Stan density function
stan_density_binomial_sens_spec_vec <- "
real binomial_sens_spec_lpmf(array[] int y, vector mu, real sens, real spec, array[] int N) {
return binomial_lpmf(y | N, mu * sens + (1 - mu) * (1 - spec));
}
"
prior_sens = paste0("beta(",cm[2,2], ",", cm[1,2], ")")
prior_spec = paste0("beta(",cm[1,1], ",", cm[2,1], ")")
priors = c(
...
set_prior(prior_sens, class = "sens"),
set_prior(prior_spec, class = "spec")
)
m = brm(
value1 | trials(n) ~ Org + ...,
data = dd,
family = binomial_sens_spec_vec,
prior = priors,
stanvars = stanvars_binomial_sens_spec_vec,
...
)
where cm
is the confusion matrix.
My reasoning behind the idea of using priors instead of data for sensitivity and specificity is that I want to propagate the uncertainty from the validation study of the classifier to the analysis of the main results. Validation studies in this area are usually not super big (maybe 1,000 test cases), so there is often plenty of uncertainty in the estimates of classifier performance.
The approach seems to work well as long as the analysis data does not become too big. The model is able to recover known parameters in simulations. However, in my actual application, in which I analyze about 400,000 cases, the information in the data completely overwhelms the information in the prior, leading to estimates for sensitivity and specificity that are far away from the observed values in the validation study. I understand why that happens —these values for sensitivity and specificity fit the analysis data better—, but it does not make sense substantially, because there is no new information about the quality of the classifications in the analysis data.
Am I missing anything obvious here? Is it my misconception that one could bring information about sensitivity and specificity into a binomial model in this way? Is there any other way to set up the model so that the prior informs sensitivity and specificity, which are used in estimating the model, but model estimation does not inform sensitivity and specificity? Any recommendations are highly appreciated.
For context:
We are analyzing gender bias in questions asked in post-match press conferences in professional tennis. One of the outcomes is, for example, whether the question is actually about tennis, with the assumption being that female players are asked more about other topics and, consequently, less about tennis. The analysis data comprises about 400,000 questions. The predictor of interest is player gender. In addition, we consider tournament, press conference, player, and year in a hierarchical model.
We use a BERT-NLI zero-shot model for classification. We evaluated the quality of the classifications in a separate validation study with about 1,000 questions. For reference, the confusion matrix for the “tennis” category looked like this:
Column 1 | Column 2 | Column 3 | Column 4 |
---|---|---|---|
Truth: No tennis | Truth: Tennis | ||
Prediction: No tennis | 454 | 67 | |
Prediciton: Tennis | 109 | 307 | |