Meta-analysis of diagnostic test accuracy

petermacp · December 10, 2024, 11:11am

Hi,
I am doing a meta-analysis to estimate the sensitivity and specificity of a new test for detection of a disease. Here is an example dataset, which illustrates the structure of the data.

library(tidyverse)
library(brms)
library(tidybayes)

#generate some data for sensitivity and specificity (id is study)
d <- tribble(
  ~id, ~sens, ~sens_lower, ~sens_upper, ~spec, ~spec_lower, ~spec_upper,
  "A", 0.70, 0.47, 0.87, 0.93, 0.87, 0.97,
  "B", 0.80, 0.28, 0.99, 0.50, 0.12, 0.88,
  "C", 0.76, 0.67, 0.84, 0.99, 0.94, 0.99,
  "D", 0.91, 0.85, 0.95, 0.97, 0.85, 0.98,
  "E", 0.84, 0.69, 0.93, 0.92, 0.74, 0.99
)

#plot to confirm okay
d %>%
  pivot_longer(
    cols = -id,
    names_to = c("measure", "type"),
    names_sep = "_"
  ) %>%
  pivot_wider(
    names_from = type,
    values_from = value,
    names_prefix = "value_"
  ) %>%
  rename(value = value_NA) %>%
  ggplot() +
  geom_pointrange(aes(y=id, x=value, xmin=value_lower, xmax=value_upper, colour=measure)) +
  facet_grid(measure~.) +
  scale_y_discrete(limits=rev)
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 2 rows [1, 4].

^{Created on 2024-12-10 with reprex v2.1.1}

Using the approach described here, we can convert the 95% confidence intervals for sensitivity and specificity for each study into a dispersion parameter.

mdata <- d %>%
  mutate(se_sens = (sens_upper-sens_lower)/3.92) %>%
  mutate(se_spec = (spec_upper-spec_lower)/3.92) %>%
  mutate(derived.phi_sens = (sens/se_sens^2) - (sens/se_sens)^2 - 1) %>%
  mutate(derived.phi_spec = (spec/se_spec^2) - (spec/se_spec)^2 - 1) %>%
  select(id, sens, derived.phi_sens, spec, derived.phi_spec)

And can then construct a multilevel meta-analysis model using beta distributions with sensitivity and specificity correlated by study

#Construct meta-analysis model
m1 <- brm(
  bf(sens ~ 1 + (1|p|id), phi ~ 0 + offset(derived.phi_sens)) +
  bf(spec ~ 1 + (1|p|id), phi ~ 0 + offset(derived.phi_spec)),
  data = mdata,
  family = Beta(link_phi = "identity"),
  control = list(adapt_delta = 0.99),
  cores = 4,
  chains = 4,
  iter = 4000
)

When we summarise the posteriors, these look “sensible” to me:

#Pooled estimates
add_epred_draws(object = m1,
                newdata = mdata, 
                re_formula = NA) %>%
  group_by(.category) %>%
  median_qi(.epred)
#> # A tibble: 2 × 7
#>   .category .epred .lower .upper .width .point .interval
#>   <fct>      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
#> 1 sens       0.815  0.582  0.923   0.95 median qi       
#> 2 spec       0.925  0.610  0.984   0.95 median qi       

#plot pooled estimates
add_epred_draws(object = m1,
                newdata = mdata, 
                re_formula = NA) %>%
  ggplot() +
  stat_halfeye(aes(x=.epred, fill=.category), alpha=0.8) +
  facet_grid(.category~.)


#how well are we recovering study-level estimates
add_epred_draws(object = m1,
                newdata = mdata) %>%
  group_by(id, .category) %>%
  median_qi(.epred) %>%
  full_join(mdata %>% select(id, sens, spec) %>% pivot_longer(-id) %>% rename(.category=name)) %>%
  ggplot() +
  geom_pointrange(aes(y=id, x=.epred, xmin=.lower, xmax=.upper, colour=.category)) +
  geom_point(aes(y=id, x=value, group=.category), colour="black", shape = 4) +
  facet_grid(.category~.) +
  scale_y_discrete(limits=rev)
#> Joining with `by = join_by(id, .category)`

^{Created on 2024-12-10 with reprex v2.1.1}

So, this all seems to work, but I had a couple of questions:

Does this seem like a reasonable approach, given the data available? Any recommended alternatives?
I note that when compiling the model, I get the warning Setting 'rescor' to FALSE by default for this model. AKAIK, it is only possible to set_rescor=TRUE for gaussian models (I couldn’t get this dev branch discussed by @andrjohns to install). What are the consequences of not addressing this?

Thanks!

LucC · December 11, 2024, 6:08pm

Hi Peter,

Interesting questions and approach. I’m not sure how the offset with the estimated dispersion works, do you have a source or somewhat longer description for that?

Here are my 5 cents on a more general aspect of diagnostic performance studies, or rather, a concern. Diagnostic performance studies are often flawed and sometimes in multiple ways:

use of a “golden standard” as reference, whereas such a standard is not real and introduces bias in performance estimates (for instance, see this explanation here).
Inconsistent case definitions because of comparing a selection of diagnostics and samples that measure different aspects of a disease or process. This can even happen when the study correctly analyses the data with latent class analysis (see same link above).

Therefore, I’d be very careful what studies you include in your meta-analysis.

EDIT: I see what you’re doing with the offset for phi now. Clever. I think this approach does mean that you are assuming that the level of dispersion is fixed and therefore decoupled from the unobserved but estimated “true” sensitivity or specificity of a test, right? If this is not reasonable, maybe it’s possible to let phi be a (non-linear function) of the estimated sensitivity and the se that you calculated from the data, using brms functionality for non-linear models?

Bob_Carpenter · December 11, 2024, 9:29pm

petermacp:

m1 <- brm(
  bf(sens ~ 1 + (1|p|id), phi ~ 0 + offset(derived.phi_sens)) +
  bf(spec ~ 1 + (1|p|id), phi ~ 0 + offset(derived.phi_spec)),
  data = mdata,
  family = Beta(link_phi = "identity")

I’m afraid I don’t know brms. What is the phi doing here? What is the p? Is derived.phi_spec data?

I’m curious why you didn’t just logit-transform the data and then work with a standard meta-analysis model like eight schools. It just seems a lot easier.

Also, your data estimates of sensitivity aren’t centered in their intervals (e.g., (.47, .70, .87), so it doesn’t look normal. This isn’t surprising given that we’re constrained to (0, 1) and the mean is 0.7—beta distributions are always skewed unless their parameters are identical and the mean is 0.5. Therefore this looks wrong, as it’s assuming there’s a normal distribution with 95% interval endpoints given at the high and low points:

se_sens = (sens_upper-sens_lower)/3.92)

This actually throws away the mean estimate when estimating the standard errors for sensitivity. Also, unconstrained standard errors don’t really make sense for constrained parameters, though they can be OK if the tails are still within the constraints. This all just seems like it’d be easier on the unconstrained log odds scale.

petermacp · December 13, 2024, 12:43pm

Thanks so much Bob. Very helpful comments indeed. I will try to expand and add some clarity.

In the dataset, we only have point estimates, and 95% confidence intervals for sensitivity and specificity for each study, as in d above. We don’t have access to the numbers of participants who were true positives, true negatives, false positives, and false negatives, nor overall denominators.

So derived.phi_sens & derived.phi_spec allows us to weight study level estimates by their precision. p in the brm formula allows us to model the correlation between sensitivity and specificity within studies. e.g. see here..

I followed your advice and also modelled these data on the unconstrained log odds scale following this approach:

mdata2 <- d %>%
  mutate(
    logodds_sens = log(sens/(1-sens)),
    logodds_spec = log(spec/(1-spec))
  ) %>%
  mutate(
    # SE on probability scale
    se_sens_prob = (sens_upper - sens_lower) / (2 * 1.96),
    se_spec_prob = (spec_upper - spec_lower) / (2 * 1.96),
    
    # SE on log-odds scale
    se_sens_logodds = se_sens_prob / (sens * (1 - sens)),
    se_spec_logodds = se_spec_prob / (spec * (1 - spec))
  )

m2 <- brm(
  bf(logodds_sens | se(se_sens_logodds) ~ 1 + (1|p|id)) +
  bf(logodds_spec | se(se_spec_logodds) ~ 1 + (1|p|id)) +
    set_rescor(FALSE),
  data = mdata2,
  family = "gaussian",
  control = list(adapt_delta = 0.99),
  cores = 4,
  chains = 4,
  iter = 4000
)

Here is a summary of the posteriors for sens and spec on the probability scale:

add_epred_draws(object = m2,
                newdata = mdata2, 
                re_formula = NA) %>%
  mutate(epred_prop = inv_logit_scaled(.epred),
         .category = case_when(.category=="logoddssens" ~"Sens",
                               .category=="logoddsspec" ~ "Spec")) %>%
  ggplot() +
  stat_halfeye(aes(x=epred_prop, fill=.category), alpha=0.8, ) +
  facet_grid(.category~.)

I then fit an additional model m3 with the set_rescor(TRUE) option, and compare all three in a table here:

(Note point estimates very similar between the beta models and log odds models, but slightly greater uncertainty with the beta model).

Would be very interested in your thoughts about:

why this might be, and which model would be “preferred”, and why. Or some other alternative approach?
any insights into modelling the residual correlation - I am not 100% clear about what this is doing and whether is necessary here.

Thanks!

petermacp · December 13, 2024, 12:51pm

Thanks so much indeed for the response!

Very much agree with your points about “gold standard” and case definitions between studies. Although these issues weren’t really the main issue in my question, I am interested in how you might extend such a model in brms to model a latent class of “true disease status” with the data that we have available.

Good point about “fixing” phi, and I think it should be possible to model phi like this:

brm(
  bf(sens ~ 1 + (1|p|id), phi ~ 0 + sens + offset(derived.phi_sens)) +
  bf(spec ~ 1 + (1|p|id), phi ~ 0 + spec + offset(derived.phi_spec)),
  ...

And that seems to go some way (but not all) to addressing the issue with more uncertain than the log odds models described in my response to @Bob_Carpenter below.

Very grateful for any further thoughts on this!

LucC · December 13, 2024, 4:55pm

Hi Peter, what you’ve specified now is a linear model for phi on the natural scale, with the observed sensitivity and specificity as predictors. That’s not what I meant; sorry if that wasn’t clear. My proposal was to take the part where you manually calculate the standard error (which you do in R and is a non-linear function) and move that inside the brms model as a function of the estimated (rather than observed) diagnostic performance. But I realise now that you don’t even use the observed diagnostic performance to calculate the se.

As for your log-odds approach: I don’t follow the way you calculate the standard errors here. The SE here should represent the standard error on the log odds scale. Analogous to your previous calculation, the simplest way of calculating this would be to take the confidence (or credible) bounds, logit transform these, and then calculate the SE bases on the difference divided by 2*1.96. That will work as long as none of the bounds hit 1.00.

As for your question about estimating the true prevalence as part of this meta-analysis: that wasn’t my intented suggestion. I merely meant to say that the studies that you include in your meta-analysis might include fancy latent class analyses that seem very robust but are still fraught by issues related to poor (and often implicit) case definitions.

Bob_Carpenter · December 13, 2024, 10:46pm

Right—that I understood. It’s just the way you are doing this with a binomial that I had an issue with as you’re using standard deviations for skewed and constrained distributions based on interval endpoints. So I just think you can probably do better here and use all three pieces of information, not just the interval endpoints.

I’m guessing that’s because the model fits your assumptions slightly better, which is what I would have expected and why I suggested it.

I’m not sure what you mean by modeling the residual correlation. Residual after what?

You can get standard errors on derived quantities in the same way as for any other posterior variable—divide the standard deviation by the square root of the effective sample size. This is what Stan does when it reports SE for transformed quantities.

You don’t need to do approximate things like this when you can just calculate SE for derived quantities like you do for other quantities. That way, the approximations don’t compound.

petermacp · December 14, 2024, 4:05pm

Thanks so much @LucC and @Bob_Carpenter. Definitely helpful, and am learning lots!

@LucC okay that is clear, and apologies for this. Calculating standard error on the log odds scale, and then modelling like this gives very similar results as before.

mdata2 <- d %>%
  mutate(
    logodds_sens = log(sens/(1-sens)),
    logodds_spec = log(spec/(1-spec))
  ) %>%
  mutate(
    log_sens_lower = log(sens_lower),
    log_sens_upper = log(sens_upper),
    log_spec_lower = log(spec_lower),
    log_spec_upper = log(spec_upper)
  ) %>%
  mutate(log_se_sens = (log_sens_upper-log_sens_lower)/3.92,
         log_se_spec = (log_spec_upper-log_spec_lower)/3.92
  )

m2 <- brm(
  bf(logodds_sens | se(log_se_sens) ~ 1 + (1|p|id)) +
  bf(logodds_spec | se(log_se_spec) ~ 1 + (1|p|id)) +
    set_rescor(FALSE),
  data = mdata2,
  family = "gaussian",
  control = list(adapt_delta = 0.99),
  cores = 4,
  chains = 4,
  iter = 4000
)

@Bob_Carpenter you say:

…divide the standard deviation by the square root of the effective sample size…

Just putting our mdata2 dataset here again (we don’t have any more information for each of the studies),

 dput(mdata2)
structure(list(id = c("A", "B", "C", "D", "E"), sens = c(0.7, 
0.8, 0.76, 0.91, 0.84), sens_lower = c(0.47, 0.28, 0.67, 0.85, 
0.69), sens_upper = c(0.87, 0.99, 0.84, 0.95, 0.93), spec = c(0.93, 
0.5, 0.99, 0.97, 0.92), spec_lower = c(0.87, 0.12, 0.94, 0.85, 
0.74), spec_upper = c(0.97, 0.88, 0.99, 0.98, 0.99), logodds_sens = c(0.847297860387203, 
1.38629436111989, 1.15267950993839, 2.31363492918063, 1.65822807660353
), logodds_spec = c(2.58668934409794, 0, 4.59511985013459, 3.47609868983527, 
2.44234703536921), log_sens_lower = c(-0.755022584278033, -1.27296567581289, 
-0.400477566597125, -0.162518929497775, -0.371063681390832), 
    log_sens_upper = c(-0.139262067333508, -0.0100503358535015, 
    -0.174353387144778, -0.0512932943875506, -0.0725706928348354
    ), log_spec_lower = c(-0.139262067333508, -2.12026353620009, 
    -0.0618754037180875, -0.162518929497775, -0.301105092783922
    ), log_spec_upper = c(-0.0304592074847086, -0.127833371509885, 
    -0.0100503358535015, -0.0202027073175195, -0.0100503358535015
    ), log_se_sens = c(0.157081764526665, 0.322172280601884, 
    0.0576847396562111, 0.0283738865077103, 0.0761461705499992
    ), log_se_spec = c(0.0277558315940814, 0.508273001196481, 
    0.0132206805777005, 0.0363051587194529, 0.07424866248225)), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

Following this blog from @Solomon, I think the updated m3 model below does what you recommend (but might be completely wrong! Getting fairly out of my depth now!).

m3 <- brm(
  bf(logodds_sens | se(log_se_sens, sigma=TRUE) ~ 1 + (1|p|id),
     sigma ~ 1 + (1 |p| id)) +
  bf(logodds_spec | se(log_se_spec, sigma=TRUE) ~ 1 + (1|p|id),
     sigma ~ 1 + (1 |p| id)) +
    set_rescor(FALSE),
  data = mdata2,
  family = "gaussian",
  control = list(adapt_delta = 0.99),
  cores = 4,
  chains = 4,
  iter = 4000
)

summary(m3)

m3_sum <- add_epred_draws(object = m3,
                newdata = mdata2, 
                re_formula = NA) %>%
  mutate(epred_prop = 1 / (1 + exp(-.epred))) %>%
  group_by(.category) %>%
  median_qi(epred_prop) %>%
  rename(estimate = epred_prop) %>%
  mutate(model = "Log odds (set_rescor(FALSE))") %>%
  mutate(.category = case_when(.category=="logoddssens" ~"sens",
                               .category=="logoddsspec" ~ "spec"))

m3_sum

add_epred_draws(object = m3,
                newdata = mdata2, 
                re_formula = NA) %>%
  mutate(epred_prop = inv_logit_scaled(.epred),
         .category = case_when(.category=="logoddssens" ~"Sens",
                               .category=="logoddsspec" ~ "Spec")) %>%
  ggplot() +
  stat_halfeye(aes(x=epred_prop, fill=.category), alpha=0.8, ) +
  facet_grid(.category~.)

Was this what you were meaning?

Again, extremely grateful for the input and advice here.

Peter

LucC · December 14, 2024, 4:35pm

Thanks for this point Bob. I see how this could apply in general. I’m not sure though if the required information is available. As I understand it, the OP only has point-estimate and 95% confidence/crediable intervals available from the studies that he is including in his meta-analysis.

petermacp:

mdata2 <- d %>%
  mutate(
    logodds_sens = log(sens/(1-sens)),
    logodds_spec = log(spec/(1-spec))
  ) %>%
  mutate(
**    log_sens_lower = log(sens_lower),**
**    log_sens_upper = log(sens_upper),**
**    log_spec_lower = log(spec_lower),**
**    log_spec_upper = log(spec_upper)**
  ) %>%
  mutate(log_se_sens = (log_sens_upper-log_sens_lower)/3.92,
         log_se_spec = (log_spec_upper-log_spec_lower)/3.92
  )

Peter, is there a particular reason why you are calculating the standard errors based on the log of the lower and upper bound? I’m not sure this is valid. I would use the logit transformation, as you do for the point estimates (as suggested by Bob and myself in earlier posts). Then at least you are treating everything on the same (log-odds) scale.

petermacp · December 16, 2024, 10:47am

@LucC thanks so much, and apologies - that is my error, and not fully understanding what you were recommending. Updated now.

To summarise where I am: models and output are below. And questions that I still have are:

From what I understand of the conversation above, model on the log-odds scale is preferred to using the beta distribution?
I am still 100% sure about the set_rescor() option in brms - in the context of the log-odds meta-analysis models, what is setting to TRUE doing, and what are consequences of not doing?
Similarly, in the context of the log-odds meta-analysis model, does estimating sigma make sense or not?

Thanks again!
Peter

#the models

m1 <- brm(
  bf(sens ~ 1 + (1|p|id), phi ~ 0 + offset(derived.phi_sens)) +
  bf(spec ~ 1 + (1|p|id), phi ~ 0 + offset(derived.phi_spec)),
  data = mdata,
  family = Beta(link_phi = "identity"),
  control = list(adapt_delta = 0.99),
  cores = 4,
  chains = 4,
  iter = 4000
)

m2 <- brm(
  bf(logodds_sens | se(logodds_se_sens) ~ 1 + (1|p|id)) +
  bf(logodds_spec | se(logodds_se_spec) ~ 1 + (1|p|id)) +
    set_rescor(FALSE),
  data = mdata2,
  family = "gaussian",
  control = list(adapt_delta = 0.99),
  cores = 4,
  chains = 4,
  iter = 4000
)

m3 <- brm(
  bf(logodds_sens | se(logodds_se_sens, sigma=TRUE) ~ 1 + (1|p|id),
     sigma ~ 1 + (1 |p| id)) +
  bf(logodds_spec | se(logodds_se_spec, sigma=TRUE) ~ 1 + (1|p|id),
     sigma ~ 1 + (1 |p| id)) +
    set_rescor(FALSE),
  data = mdata2,
  family = "gaussian",
  control = list(adapt_delta = 0.99),
  cores = 4,
  chains = 4,
  iter = 4000
)

m4 <- brm(
  bf(logodds_sens | se(logodds_se_sens) ~ 1 + (1|p|id)) +
  bf(logodds_spec | se(logodds_se_spec) ~ 1 + (1|p|id)) +
    set_rescor(TRUE),
  data = mdata2,
  family = "gaussian",
  control = list(adapt_delta = 0.99),
  cores = 4,
  chains = 4,
  iter = 4000
)

m5 <- brm(
  bf(logodds_sens | se(logodds_se_sens, sigma=TRUE) ~ 1 + (1|p|id),
     sigma ~ 1 + (1 |p| id)) +
  bf(logodds_spec | se(logodds_se_spec, sigma=TRUE) ~ 1 + (1|p|id),
     sigma ~ 1 + (1 |p| id)) +
    set_rescor(TRUE),
  data = mdata2,
  family = "gaussian",
  control = list(adapt_delta = 0.99),
  cores = 4,
  chains = 4,
  iter = 4000
)

Summarized in a figure

LucC · December 16, 2024, 10:20pm

Hi, I haven’t used the set_rescor() argument, but from a quick look at the brms documentation and blogs on multivariate regression, it’s my impression that this option is used when you use the mvbind() syntax to link tuples of observations. I don’t know how this works with the syntax you’re using, where you provide the model with the standard errors of the observations. No idea what set_rescor() does in that case. Hopefully, this is an easy quick answer for @paul.buerkner?

For what its worth, the estimates look more or less identical across the different models. The number of digits in the estimates is a bit misleading, suggesting a lot of precision. I would suggest to round things down to at most 3 decimals (on the scale 0-1).

Bob_Carpenter · December 16, 2024, 10:24pm

I was talking about getting SE values on derived quantities from posterior draws, like general expectations. This is just what the SE values mean from the sampler.

For a meta-analysis model, you’re also providing SE values along with the measurements. These are what @petermacp is calculating from the end points. What I was suggesting is to transform probabilities to log odds with logit(u) = log(u / (1 - u)), transform the boundaries the same way, then get a standard error estimate by assuming normality on the unconstrained log odds values. It should be a bit less skewed than with the direct probability endpoints.

LucC · December 16, 2024, 10:32pm

Ah OK @Bob_Carpenter, I think we mean the same thing then!

paul.buerkner · December 17, 2024, 6:47am

set_rescor either activates or deactivates estimating residual correlations between response variables in a multivariate model if their individual families allow (all gaussian or all student-t).

This is independent of mvbind. mvbind is just syntactical sugar. if you have multiple response variables with the same right-hand sides, you may use mvbind(y1, y2) ~ RHS instead of bf(y1 ~ RHS) + bf(y2 ~ RHS).

Topic		Replies	Views
Modelling accuracy of a test across its range brms	16	634	September 5, 2023
Joint estimation of a test's sensitivity and specificity along with its predictive values Modeling brms	4	109	December 10, 2024
Priors for sensitivity and specificity in logistic regression model for diagnostic test brms	2	1041	November 16, 2021
Binomial model with prior information on sensitivity and specificity Modeling specification , brms	3	123	August 1, 2024
Trouble with meta-analysis brms meta-analysis	3	633	February 21, 2019

Meta-analysis of diagnostic test accuracy

Related topics