Prevalence estimation with imperfect test

Hi,

i have a binary outcome of a test in a large sample, i know the test is not perfect and wish to estimate the true prevalence. I use a logistic regression approach, where i marginalized the true disease status.

I have pretty good knowledge about the sensitivity/specificity of the test from earlier research and specify this as a beta prior. I am, however, puzzled by the fact that my sensitivity/specificity posterior is different from my prior. As far as i can see the data holds no information about the sensitivity/specificity and still, both get updated and in particular specificity moves all his way up to the boundary of the prior. so, i feel my resulting prevalence is based on a too optimistic specificity.

can someone explain to me what’s happening here?

data{
  int status[2];
  int Freq[2];
}
     
parameters {
  vector<lower=.5, upper=1>[2] sensspec;
  real<lower=0, upper=1> pie;
}
   
transformed parameters {
  real loglik[2];
    
  for (i in 1:2){
    if (status[i] == 2){          
      loglik[i] = Freq[i]*log(pie*sensspec[1] + (1-pie)*(1-sensspec[2]));
    } else if (status[i] == 1){
      loglik[i] = Freq[i]*log(pie*(1-sensspec[1]) + (1-pie)*sensspec[2]);
    }
  }
}


model {
  pie ~ beta(2,8);
  for (i in 1:2){
     sensspec[i] ~ beta(100, 10);
   }
  
  target += sum(loglik);
}

2 Likes

Hi,
I’ve just ran the model and it didn’t appear to behave in an unreasonable way, in particular the marginal posterior of sensspec seemed to be almost identical to the prior distribution. Could you share the exact input data you use and the output you are getting?

Thanks for looking into that, Martin! It seems to affect specificity more than sensitivity, probably because of the low prevalence and the larger sample size in noncases. See the comparison of prior and posterior:

library(rstan)
library(shinystan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

data = list(status = 1:2, Freq = c(20000, 2000))

model_code = '
data{
  int status[2];
int Freq[2];
}

parameters {
vector<lower=.5, upper=1>[2] sensspec;
real<lower=0, upper=1> pie;
}

transformed parameters {
real loglik[2];

for (i in 1:2){
if (status[i] == 2){          
loglik[i] = Freq[i]*log(pie*sensspec[1] + (1-pie)*(1-sensspec[2]));
} else if (status[i] == 1){
loglik[i] = Freq[i]*log(pie*(1-sensspec[1]) + (1-pie)*sensspec[2]);
}
}
}


model {
pie ~ beta(2,8);
for (i in 1:2){
sensspec[i] ~ beta(100, 10);
}

target += sum(loglik);
}
'
model = stan_model(model_code = model_code)
samples = sampling(model, data = data)
prior = data.frame(parameter = c("sensspec[1]", "sensspec[2]"), value = rbeta(1000, 100, 10))

stan_hist(samples, "sensspec") + geom_histogram(data = prior, alpha  = .5)

I see - there are two things:

  1. You’ve put a lower bound of 0.5 on your values, so your prior is not really simulating from the prior you use in the model. Using rejection sampling to actually sample from beta with a lower bound:
 x  <- rbeta(1e5,100,10); x <- x[x > 0.5]; quantile(x, probs = c(0.025, 0.25, 0.50, 0.75, 0.975))
#      2.5%       25%       50%       75%     97.5% 
# 0.8492021 0.8920933 0.9113891 0.9284150 0.9549340 

Which gets me very close to what I get when running the model with your data:

                mean se_mean    sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
sensspec[1]     0.91    0.00  0.03     0.84     0.89     0.91     0.93     0.96  1705 1.00
sensspec[2]     0.94    0.00  0.01     0.91     0.93     0.94     0.95     0.96   924 1.01
pie             0.03    0.00  0.02     0.01     0.02     0.03     0.04     0.06   912 1.01
  1. you have a data-prior conflict. Let’s simulate the proportion of positives you would expect given your priors:
N <- 1e5
sensspec <- matrix(rbeta(2 * N, 100, 10), nrow = N, ncol = 2)
pie <- rbeta(N, 2 , 8)
theta <- pie*sensspec[1,] + (1-pie)*(1-sensspec[2,])

quantile(theta, probs = c(0.025, 0.25, 0.50, 0.75, 0.975))
#     2.5%       25%       50%       75%     97.5% 
# 0.1070964 0.1823829 0.2434531 0.3207248 0.4977559 

So the observed positive frequence of roughly 0.09 is a priori quite unlikely. Something has to give. So the model does the “rational” thing: it modifies it’s belief about all the parameters. I.e., if you believe a disease is reasonably common (around 1/5 of population), you test with sens and spec around 0.9 (which should give you a ton of false positives) and you get unexpectedly few positives, you need to start to doubt both that disease is common but also that your test is working as expected (or more generally, that your model is correct).

In that sense, the model behaves as expected. The correct reaction is not necessarily to update your beliefs about sensitivity and specificity, but to doubt that those beliefs, your prior on prevalence or other aspects of data collection are correctly expressed in the model.

Does that make sense?

Finally, the model as you wrote is not very idiomatic Stan and it took me a while to understand. For inspiration, here’s how I would write the same model, taking advantage of Stan built-in functions and a slightly simpler form of input data (but there are many ways to write this, I don’t claim this is best or definitive):

data{
  int N_total;
  int N_positive;
}

parameters {
vector<lower=.5, upper=1>[2] sensspec;
real<lower=0, upper=1> pie;
}

transformed parameters {
  real theta = pie*sensspec[1] + (1-pie)*(1-sensspec[2]);
}

model {
  pie ~ beta(2,8);
  sensspec ~ beta(100, 10); //This gives all elements of sensspec the same prior, no need for a loop
  
  N_positive ~ binomial_lpmf(N_total, theta);
}
2 Likes

That’s very instructive, thanks for the clarification!

Glad to have helped.

I just realized there is an even simpler way of putting it - if your test has specificity roughly 0.9, you would expect around 2000 positives and 20000 negatives if the true prevalence was exactly 0. So given your data, the specificity basically has to be higher than 0.9

2 Likes

That’s a great explanation for the observed behaviour - there is actually some information (a lower bound) on the specificity of the test in the data. Thanks for pointing this out!