Bivariate model for a meta analysis of diagnostic test accuracy

Hi,

I would like to fit a bivariate model for meta analysis of diagnostic test accuracy (sensitivity and specificity). I have approx 50 studies to be included with four cell counts for each study (namely, true positive, false positive, true negative, false negative).

In my codes (attached down below), I transformed the count data to logit of true positive rate and false positive rate and calculated their standard errors. To fit a bivariate normal models for sensitivity and specificity, I wanted to use the standard errors in my covariance matrix for within study variability, but the correlation is unknown. So I wanted to estimate the correlation for within study variability using the known variances while estimating a whole covariance matrix for between study variability.

As I got errors saying my covariance matrix is not symmetric, I tried using the matrix decomposition, but I now got the following error. I was thinking it occurred because I was trying to plug in the estimated variances into the covariance matrix for within variability even though I declared it in the transformed parameters block. Yet, I am not sure how to proceed.

Variable "cov_matrix" does not exist.
 error in 'model5e203298a5ca_sen_spe_uni' at line 63, column 14
  -------------------------------------------------
    61: 
    62: 
    63: 	cov_matrix[J] sigma_w_mat[K]; // vcov matrix for within study variability
                     ^
    64: 
  -------------------------------------------------

The data I am using for the above code is uploaded here:

My code is as follows.

Any help to fix the error, as well as any other comments on the modelling as a whole, would be greatly appreciated.

data {

	int<lower=1> K; /* Number of studies */
	int<lower=1> J; /* sensitivity and specificity = 2*/
	real<lower=0> TP[K];
	real<lower=0> FP[K];
	real<lower=0> FN[K];
	real<lower=0> TN[K];

}

transformed data{
	real<lower=0> sigma_tpr[K];
	real<lower=0> sigma_fpr[K];
	real TPR[K];
	real FPR[K];
	real LOGIT_tpr[K];
	real LOGIT_fpr[K];
	vector[J] LOGIT_pair[K];
	vector<lower=0>[J] sigma_w[K]; // within-study sd
	

	for (i in 1:K){
		TPR[i] = TP[i]/(TP[i] + FN[i]);
		FPR[i] = FP[i]/(TN[i] + FP[i]);
		
		LOGIT_tpr[i] = log(TPR[i]/(1-TPR[i]));
		sigma_tpr[i] = sqrt(1/((TP[i] + FN[i])*TPR[i]*(1-TPR[i])));
		
		LOGIT_fpr[i] = log(FPR[i]/(1-FPR[i]));
		sigma_fpr[i] = sqrt(1/((TN[i] + FP[i])*FPR[i]*(1-FPR[i])));

		LOGIT_pair[i,1] = LOGIT_tpr[i];
		LOGIT_pair[i,2] = LOGIT_fpr[i];

		sigma_w[i,1] = sigma_tpr[i]*sigma_tpr[i];
		sigma_w[i,2] = sigma_fpr[i]*sigma_fpr[i];
		
		
		}

}

parameters{
	vector[J] mu;
	vector[J] theta;
	vector<lower=0>[J] tau; // between-study sd

	corr_matrix[J] Omega_b; // correlation matrix for between study variability
	corr_matrix[J] Omega_w[K]; // correlation matrix for within study variability


}

transformed parameters {
	cov_matrix[J] sigma_b_mat; // vcov matrix for between study variability
	sigma_b_mat = quad_form_diag(Omega_b, tau);


	cov_matrix[J] sigma_w_mat[K]; // vcov matrix for within study variability

	for (i in 1:K){
		sigma_w_mat[i] = quad_form_diag(Omega_w[i], sigma_w[i]);
	}
	
}



model {
	
	theta ~ multi_normal(mu, sigma_b_mat);

	for (i in 1:K){
		LOGIT_pair[i] ~ multi_normal(theta, sigma_w_mat[[group[i]]);

	}
	

	// priors
	mu ~ normal(0,1);
	tau ~ cauchy(0, 0.5);
	Omega_w ~ lkj_corr(1); //LKJ prior on the correlation matrix
	Omega_b ~ lkj_corr(1); //LKJ prior on the correlation matrix
	

}

This might be an easier model to think about if you use a different representation of the data. Specifically, for a given study you can re-code the 2x2 table of counts into two vectors, both containing 0/1, where one reflects the outcome of the diagnostic test (D) and the other reflects the truth/gold-standard (T) So, if you had a 2x2 of:

TP: 1
FP: 2
TN: 3
FN: 4

Then you’d have vectors:

D T
1 1
1 0
1 0
0 0
0 0
0 0
0 1
0 1
0 1
0 1
0 1

Then, for a given study, and using R’s formula syntax, you have a generalized linear model:

study_fit = glm(
    data = study_data
    , formula = D ~ 1+T
    , family = binomial
)

Where, if you make T a factor and use sum contrasts, the intercept parameter will reflect bias of the Diagnostic test while the effect of T will reflect the sensitivity (in the signal detection theory sense; I hate how medical stats adopted the same term for a different quantity in the same realm) of the Diagnostic test.

From there, the formulation of a meta-analysis can be achieved by treating the different studies as “random effects”, so if you had the above data for all studies combined together (again, making T a factor with sum contrasts) with a third vector identifying the study, then using lme4/brms formula notation you’d do:

meta_fit ~ glmer( 
    data = study_data
    , formula = D ~ 1+T + ( 1+T | Study)
    , family = binomial
)

(Using glmer there only bc I’m not confident in my brms)

In which case you’ve implemented a model where there’s an across-study mean bias, an across-study mean sensitivity, then each study gets its own regularized bias and sensitivity as correlated variates.

Finally, if you want to characterize the posterior’s implications for the TP/FP/TN/FN cell probabilities, these are deterministic transforms of the bias & sensitivity.

Note that if you have lots of data and encounter slow sampling, there are tricks to speed up binomial outcomes by having the likelihood evaluated as counts, but I didn’t want to confuse things by starting with that.

Note that the above signal detection theoretic model implies a shared variance magnitude between the latent distributions; if that’s not appropriate then check out this blog series on more nuanced approaches.

Hi,

Thank you for your quick and detailed advise. I had no idea how to fit the model using generalised linear mixed modelling, so your advice is really helpful. One quick additional question is what the “latent distributions” you meant here because I do not know the signal detection theoretic model.

In the meantime, my initial intent is to try a bayesian approach to do the bivariate model, so I would still like to get any feedback on my code and/or formulation.

To be clear, I would like to try the models framed in this paper , and its SAS code is here.

It seems they did not estimate the within-study correlation, and I am not sure if it is sensible to estimate it, which is what I am trying to do.

Thank you!