Help with COVID-19 prevalence study - Retesting the positives

Hi,

I am part of an NGO that is running COVID-19 seroprevalence studies in low and middle income countries. We are planning to do ~1000 randomized antibody tests every other week in each country to measure the progress of the prevalence rate and to give insights to policymakers on how to respond to the sanitary crisis.

We followed Gelman and Carpenter (2020) to take into account specificity and sensitivity uncertainty and to include post-stratification. Given that these countries have low prevalence and that antibody testing has a higher false positive rate than PCR testing, we were wondering if we could lower our credible interval range by retesting (only) the people that have a positive antibody test in the first test. The goal would be to reduce the uncertainty around false positives but without having to re-test everybody.

We would appreciate any suggestions toward the inclusion of the results of the second tests in our MRP (which looks very similar to appendix A.3. model from the paper referenced above), or even ideas regarding the usefulness of such procedure in order to decrease uncertainty.

Our code for one of the countries:

data {
  int<lower = 0> N;  // number of tests in the sample (~1000)
  int<lower = 0, upper = 1> y[N];  // 1 if positive antibody test, 0 if negative

  int y_spec;
  int n_spec;
  int y_sens;
  int n_sens;

  vector<lower = 0, upper = 1>[N] male;  // 0 if female, 1 if male
  int<lower = 0> N_region;  // number of region codes (13 at the region level)
  int<lower = 1, upper = N_region> region[N];  // region codes 1 through 13
  vector[N_region] x_region;  // predictors at the region code level
  int<lower = 0> J;  // number of population cells, J = 2*13
  vector<lower = 0>[J] N_pop;  // population sizes for poststratification
  real intercept_prior_mean;
  real<lower = 0> intercept_prior_scale;
  real<lower = 0> coef_prior_scale;
}

parameters {
  real<lower = 0, upper = 1> spec;
  real<lower = 0, upper = 1> sens;
  vector[3] b;  // intercept, coef for male, and coef for x_region
  real<lower = 0> sigma_region;
  vector<multiplier = sigma_region>[N_region] a_region;  // varying intercepts for region code
}

model {
  vector[N] p = inv_logit(b[1] + b[2] * male + b[3] * x_region[region] + a_region[region]);
  vector[N] p_sample = p * sens + (1 - p) * (1 - spec);
  //y_sample
  y ~ bernoulli(p_sample);
  y_spec ~ binomial(n_spec, spec);
  y_sens ~ binomial(n_sens, sens);
  a_region ~ normal(0, sigma_region);
  // prior on centered intercept
  b[1] + b[2] * mean(male) + b[3] * mean(x_region[region]) ~ normal(intercept_prior_mean, intercept_prior_scale);
  b[2] ~ normal(0, coef_prior_scale);
  sigma_region ~ normal(0, coef_prior_scale);
  b[3] ~ normal(0, coef_prior_scale / sd(x_region[region]));  // prior on scaled coefficient
}

generated quantities {
  real p_avg;
  vector[J] p_pop;  // population prevalence in the J poststratification cells
  int count;
  count = 1;
  for (i_region in 1:N_region) {
    for (i_male in 0:1) {
      p_pop[count] = inv_logit(b[1] + b[2] * i_male + b[3] * x_region[i_region] + a_region[i_region]);
      count += 1;
    }
  }
  p_avg = dot_product(N_pop, p_pop) / sum(N_pop);
}
2 Likes

Hi, coordinator of the Brazilian arm of the study here.
This is an interesting problem. Let Y_1, Y_2 \in \{ 0, 1\} be the results of both tests. In a way it is similar to the problem described in Section 2.1 here. A major difference is that the entries c_i and d_i in Table 1 therein are missing, since you only know Y_2 \mid Y_1 = 1.

I’ll think about the necessary algebra in the coming days, but one should be able write \theta := \operatorname{Pr}(Y_2 = 1 \mid Y_1 = 1) explicitly and plug that into a different binomial likelihood. MRP should proceed in the same fashion, but using estimates of \pi that take into account the distribution of \theta.

EDIT: Using the notation from Gelman & Carpenter, let p = (1-\gamma)(1-\pi) + \delta\pi. Recall that \operatorname{Pr}(Y_1 = 1) = p.
Then

\begin{align} \operatorname{Pr}(Y_2 = 1) &= \operatorname{Pr}(Y_2 = 1 \mid Y_1 = 1)\operatorname{Pr}(Y_1 = 1) + \operatorname{Pr}(Y_2 = 1 \mid Y_1 = 0)\operatorname{Pr}(Y_1 = 0),\\ \eta &:= \theta p \end{align}

which means we have one more parameter to estimate, \theta. Let me know if that does the trick.

Further edit: this is wrong. But if we can write the conditionals in terms of \gamma_2 and \delta_2, i.e., the specificity and sensitivity of the second test, we can make progress. Since \operatorname{Pr}(Y_2 = 1 \mid Y_1 = 0) = 0, \eta = \theta p.

1 Like

Just a small note, I would say the goal is not to retest everyone, but get credible estimates

What dose the equation \text{Pr}(Y_2=1|Y_1=0)=0 mean intuitiely?

that you can’t get a positive test on the second test if the first one was negative.

From it, I infer \theta = \text{Pr}(Y_2=1)=\eta \theta so, I am confused…

Yup, the maths is not nailed down yet. Feel free to post the correct derivation here if you figure it out.

In principle, we can deduce the representation formula of the probability of the event that true positive and ture negatives of the repeated test, and I guess it is not so improved by the repeating test, because, in the past, in some conference in japan, one calcultes such a formula indicating diagnositic ability of repeated test and it depends case by case but totally, in my impression, it dose not improve drastically the probabilty of the event that the result of test is true by repeating test.
So, even if the range of credible interval shrinks by collecting the data of repeated tests, the truth itself will not be improved drastically.

This is my poor guess, so, it is an interesting problem.

Hi @cucho,

Here’s a fresh derivation, due to my friend Leo Bastos. Please note that the probability he derives and the one I did above are not for the same event. Nevertheless, I believe his derivation is clearer and more useful and hence my post (and edits) above should be all but disregarded.

Let R \in \{0, 1\} be the true disease status and let \gamma_i and \delta_i be the specificity and sensitivity of test i, respectively.
Let Y be the outcome of a re-testing positives only strategy, i.e. Y=1 if tests 1 and 2 are both positive, and Y=0 otherwise. So

\begin{align*} p_2 := \operatorname{Pr}( Y = 1) & = \operatorname{Pr}(Y = 1 , R = 1) + \operatorname{Pr}(Y = 1 , R = 0),\\ & = \operatorname{Pr}(Y = 1 \mid R = 1) \operatorname{Pr}(R = 1) + \operatorname{Pr}(Y = 1 \mid R = 0) \operatorname{Pr}(R = 0),\\ & = \operatorname{Pr}(T_2 = 1 , T_1 = 1\mid R = 1) \operatorname{Pr}(R = 1) + \operatorname{Pr}(T_2 = 1 , T_1 = 1\mid R = 0) \operatorname{Pr}(R = 0), \\ & = \operatorname{Pr}(T_2 = 1 \mid T_1 = 1, R = 1) \operatorname{Pr}(T_1 = 1\mid R = 1) \operatorname{Pr}(R = 1) + \\ & + \operatorname{Pr}(T_2 = 1 \mid T_1 = 1, R = 0) \operatorname{Pr}(T_1 = 1\mid R = 0) \operatorname{Pr}(R = 0), \\ & = \delta_2 \delta_1 \pi + (1 - \gamma_2)(1 - \gamma_1) (1 - \pi) \end{align*}

where the last line follows if we assume that the two tests are conditionally independent given R.

1 Like