In order to assess the performance parameters of a biological test, I have a sample for which I have both the results provided by the test and the results provided by a gold standard.
Sensitivity parameters are modelled quite simply using a logistic model.
Se = P(Test+|Gold+)
Sp = P(Test-|Gold-)
brm(
bf(test|trials(n) ~ a,
a ~ 0 + gold,
nl = T
),
data = data, family = binomial(),
prior = c(
set_prior("normal(-3, 1.5)", class = "b", coef = "gold0", nlpar = "a"),
set_prior("normal(3, 1.5)", class = "b", coef = "gold1", nlpar = "a")
)
)
Se = \frac{1}{(1+e^{-gold1})}
Sp = 1-\frac{1}{(1+e^{-gold0})}
I also try to estimate the predictive values of the test based on estimates of sensitivity and specificity and the prevalence of the disease (p).
PPV = \frac{Se.p}{Se.p + (1-Sp).(1-p)}
NPV = \frac{Sp.(1-p)}{Sp.(1-p) + (1-Se).p}
At this stage, I estimate the predictive values the test by applying these formulae to a table containing a sample of the posterior sensitivity and specificity predicted values and a sample of the prevalence distribution values that I know a priori.
However, I don’t find this method very elegant and would prefer to estimate all the parameters directly during modelling.
My problem is therefore how to incorporate the prevalence into the model. Since it’s not a point value, I can’t integrate it directly into the non-linear syntax formula. Should I simply define it as a pior? But as I’m not trying to estimate this parameter I doubt that’s the right solution.
Thanks in advance for your insights.