I am trying to find the sensitivity and specificity of a diagnostic test compared to a gold standard, using logistic regression modeling as described here The logistic modeling of sensitivity, specificity, and predictive value of a diagnostic test - ScienceDirect
I wanted a Bayesian modeling solution because it will allow me to obtain uncertainty intervals of the estimate, set priors, and potentially include other variables in the model. (The article is old, so if there are some flaws in this method, or a better way, please let me know!)
In the code below, I set up small example and test to make sure I recover the sensitivity and specificity in the model. For uncertainty interval, I use the shortest posterior density interval using code taken directly from @andrewgelman and @Bob_Carpenter paper diagnostic-testing/rstan-santa-clara.R at master · bob-carpenter/diagnostic-testing · GitHub
#This code calculates the sensitivity and specificity comparing a test to a gold standard using logistic regression.
#The formulas were taken from https://www.sciencedirect.com/science/article/pii/089543569290180U?via%3Dihub
library(brms)
#set up data and get sensitivity and specificity.
# 0 = negative test and 1 = positive test. test = diagnostic test. gs = gold standard
test <- c(0,0,0,1,0,0,1,0,1,1,0,0,0,1,0,0,0,0,0,0)
gs <- c(1,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,1)
sens <- 4/(4+2)
spec <- 13/(13+1)
data <- cbind(test,gs)
data <- data.frame(data)
#logistic model. The test is the outcome variable and the gold standard is the predictor
m1 <- brm(test ~ gs, data=data, family=bernoulli)
#extract posterior samples
samp1 <- posterior_samples(m1)
#define function for shortest posterior interval
spin <- function(x, lower=NULL, upper=NULL, conf=0.95){
x <- sort(as.vector(x))
if (!is.null(lower)) {
if (lower > min(x)) stop("lower bound is not lower than all the data")
else x <- c(lower, x)
}
if (!is.null(upper)) {
if (upper < max(x)) stop("upper bound is not higher than all the data")
else x <- c(x, upper)
}
n <- length(x)
gap <- round(conf*n)
width <- x[(gap+1):n] - x[1:(n-gap)]
index <- min(which(width==min(width)))
x[c(index, index + gap)]
}
#calculate sensitivity from the model and compare to the raw calculation of sensitivity
#equation 2 from the linked paper
m1_sens <- 1/(1 + exp(-(samp1$b_Intercept + 1*samp1$b_gs)))
m1_sens_mean <- mean(m1_sens)
m1_sens_mean
print(spin(m1_sens, lower=0, upper=1, conf=0.95))
hist(m1_sens,breaks=100)
sens
#calculate specificity from the model and compare to the raw calculation of specificity
#equation 3 from the linked paper
m1_spec <- 1 - (1/(1 + exp(-(samp1$b_Intercept + 0*samp1$b_gs))))
m1_spec_mean <- mean(m1_spec)
m1_spec_mean
print(spin(m1_spec, lower=0, upper=1, conf=0.95))
hist(m1_spec,breaks=100)
spec
The model recovers values for sensitivity and specificity that are close to the raw calculation.
I would like to set priors for sensitivity and specificity in this model. If (!) I did the
algebra correctly, then for this simple model where the outcome variable is binary diagnostic test and the predictor is binary (0/1) gold standard, then based on the paper I linked first (equations 2 and 3), the intercept in the model should be related to specificity in this way:
alpha = -log( (1/(1-specificity)) - 1 )
and coefficient for the predictor to specificity and sensitivity in this way:
beta = log( (1/(1-specificity)) - 1) - log( (1/sensitivity) - 1 )
It seems like I should be able to set priors for sensitivity and specificity by setting priors for alpha
and beta
- I just plug in values of sensitivity and specificity into the above formulas and come up with some plausible values for alpha
and beta
based on plausible values for sensitivity and specificity.
However, when I plug my model calculated (or the raw) values for sensitivity and specificity into those formulas, I don’t get my alpha
and beta
back…
-log( (1/(1-m1_spec_mean)) - 1 )
log( (1/(1-m1_spec_mean)) - 1) - log( (1/m1_sens_mean) - 1 )
-log( (1/(1-spec)) - 1 )
log( (1/(1-spec)) - 1) - log( (1/sens) - 1 )
I feel like I am missing something obvious…maybe having to do with the log transformations in the algebra? If this isn’t the right way, then how can I go about setting priors for sensitivity and specificity in this particular simple logistic regression model?
Thanks!