I am part of an NGO that is running COVID-19 seroprevalence studies in low and middle income countries. We are planning to do ~1000 randomized antibody tests every other week in each country to measure the progress of the prevalence rate and to give insights to policymakers on how to respond to the sanitary crisis.
We followed Gelman and Carpenter (2020) to take into account specificity and sensitivity uncertainty and to include post-stratification. Given that these countries have low prevalence and that antibody testing has a higher false positive rate than PCR testing, we were wondering if we could lower our credible interval range by retesting (only) the people that have a positive antibody test in the first test. The goal would be to reduce the uncertainty around false positives but without having to re-test everybody.
We would appreciate any suggestions toward the inclusion of the results of the second tests in our MRP (which looks very similar to appendix A.3. model from the paper referenced above), or even ideas regarding the usefulness of such procedure in order to decrease uncertainty.
Our code for one of the countries:
data {
int<lower = 0> N; // number of tests in the sample (~1000)
int<lower = 0, upper = 1> y[N]; // 1 if positive antibody test, 0 if negative
int y_spec;
int n_spec;
int y_sens;
int n_sens;
vector<lower = 0, upper = 1>[N] male; // 0 if female, 1 if male
int<lower = 0> N_region; // number of region codes (13 at the region level)
int<lower = 1, upper = N_region> region[N]; // region codes 1 through 13
vector[N_region] x_region; // predictors at the region code level
int<lower = 0> J; // number of population cells, J = 2*13
vector<lower = 0>[J] N_pop; // population sizes for poststratification
real intercept_prior_mean;
real<lower = 0> intercept_prior_scale;
real<lower = 0> coef_prior_scale;
}
parameters {
real<lower = 0, upper = 1> spec;
real<lower = 0, upper = 1> sens;
vector[3] b; // intercept, coef for male, and coef for x_region
real<lower = 0> sigma_region;
vector<multiplier = sigma_region>[N_region] a_region; // varying intercepts for region code
}
model {
vector[N] p = inv_logit(b[1] + b[2] * male + b[3] * x_region[region] + a_region[region]);
vector[N] p_sample = p * sens + (1 - p) * (1 - spec);
//y_sample
y ~ bernoulli(p_sample);
y_spec ~ binomial(n_spec, spec);
y_sens ~ binomial(n_sens, sens);
a_region ~ normal(0, sigma_region);
// prior on centered intercept
b[1] + b[2] * mean(male) + b[3] * mean(x_region[region]) ~ normal(intercept_prior_mean, intercept_prior_scale);
b[2] ~ normal(0, coef_prior_scale);
sigma_region ~ normal(0, coef_prior_scale);
b[3] ~ normal(0, coef_prior_scale / sd(x_region[region])); // prior on scaled coefficient
}
generated quantities {
real p_avg;
vector[J] p_pop; // population prevalence in the J poststratification cells
int count;
count = 1;
for (i_region in 1:N_region) {
for (i_male in 0:1) {
p_pop[count] = inv_logit(b[1] + b[2] * i_male + b[3] * x_region[i_region] + a_region[i_region]);
count += 1;
}
}
p_avg = dot_product(N_pop, p_pop) / sum(N_pop);
}
Hi, coordinator of the Brazilian arm of the study here.
This is an interesting problem. Let Y_1, Y_2 \in \{ 0, 1\} be the results of both tests. In a way it is similar to the problem described in Section 2.1 here. A major difference is that the entries c_i and d_i in Table 1 therein are missing, since you only know Y_2 \mid Y_1 = 1.
I’ll think about the necessary algebra in the coming days, but one should be able write \theta := \operatorname{Pr}(Y_2 = 1 \mid Y_1 = 1) explicitly and plug that into a different binomial likelihood. MRP should proceed in the same fashion, but using estimates of \pi that take into account the distribution of \theta.
EDIT: Using the notation from Gelman & Carpenter, let p = (1-\gamma)(1-\pi) + \delta\pi. Recall that \operatorname{Pr}(Y_1 = 1) = p.
Then
which means we have one more parameter to estimate, \theta. Let me know if that does the trick.
Further edit: this is wrong. But if we can write the conditionals in terms of \gamma_2 and \delta_2, i.e., the specificity and sensitivity of the second test, we can make progress. Since \operatorname{Pr}(Y_2 = 1 \mid Y_1 = 0) = 0, \eta = \theta p.
In principle, we can deduce the representation formula of the probability of the event that true positive and ture negatives of the repeated test, and I guess it is not so improved by the repeating test, because, in the past, in some conference in japan, one calcultes such a formula indicating diagnositic ability of repeated test and it depends case by case but totally, in my impression, it dose not improve drastically the probabilty of the event that the result of test is true by repeating test.
So, even if the range of credible interval shrinks by collecting the data of repeated tests, the truth itself will not be improved drastically.
This is my poor guess, so, it is an interesting problem.
Here’s a fresh derivation, due to my friend Leo Bastos. Please note that the probability he derives and the one I did above are not for the same event. Nevertheless, I believe his derivation is clearer and more useful and hence my post (and edits) above should be all but disregarded.
Let R \in \{0, 1\} be the true disease status and let \gamma_i and \delta_i be the specificity and sensitivity of test i, respectively.
Let Y be the outcome of a re-testing positives only strategy, i.e. Y=1 if tests 1 and 2 are both positive, and Y=0 otherwise. So