Posterior Predictive Distribution in Bivariate Meta-Analysis

arthur-albuquerque · February 2, 2022, 9:05pm

I want to generate the posterior predictive distribution in a bivariate meta-analysis using brms. The model is described as (ref):

And here is the posterior predictive distribution:

As shown below, one needs to use brms::fcor() to include the within-study covariance-variance matrix in the brms model. However, I have never generated the posterior predictive distribution in a model with fcor(). I tried to apply the brms::posterior_predict() syntax I usually use for univariate meta-analysis models (no fcor()), but I get the following error:

Error: Dimensions of ‘M’ for FCOR terms must be equal to the number of observations.

I don’t understand it. Could anyone give me some light on this topic? See full code below.

Thanks!

# Install/load packages
pacman::p_load(brms,
               tidybayes,
               tidyverse)

pacman::p_load_gh("stan/cmdstanr",
                  "wviechtb/metafor") # for {metadat} and vcor()

# Load dataset with two correlated outcomes
dat = metadat::dat.berkey1998

### construct variance-covariance matrix assuming within-study correlation = 0.5
V = metafor::vcalc(vi, cluster=author, type=outcome, rho=0.5, data=dat)

# Specify formula
mf2 = 
  brms::bf(yi ~ 0 + outcome + (0 + outcome|author) + fcor(V))

# Fit model
m2 = brms::brm(
  formula = mf2,
  prior = 
    prior(constant(1), class = "sigma"),
  data = dat,
  data2 = list(V = V),
  family = gaussian,
  
  warmup = 5000, iter = 10000,
  chains = 4,
  seed = 123,
  backend = "cmdstanr"
)

## see ?brms::fcor for further details on formula syntax and data2 argument

## see https://bit.ly/34tHcUU for explananations on the "constant(1) prior"
## and formula syntax

m2

nd = data.frame(author = "new",
                outcome = c("AL", "PD"))

brms::posterior_predict(object = m2,
                        newdata = nd,
                        re_formula = NULL,
                        allow_new_levels = TRUE,
                        sample_new_levels = "gaussian")

arthur-albuquerque · February 2, 2022, 11:43pm

As Brenton commented on Twitter, posterior_predict() doesn’t seem to be the adequate function for my case. In fact, I want to generate the posterior predictive only accounting for the between-study variance-covariance matrix (“D”, as shown above), and further confirmed by Richard Riley here.

Thus, I want to “ignore” the data brought by brms::fcor(V) in the code above. Unfortunately, brms::posterior_epred() does not ignore it and yields the same error.

Not sure why though.

Stephen_Wild · February 2, 2022, 11:46pm

Based on the chat in the Twitter thread and the SAS code you posted, you can try extracting the posterior samples and use library(rmvtnorm) to do the posterior predictive check.

Try this code after you run your model.

library(mvtnorm)

samps <- posterior_samples(m2, pars = c("b_outcomePD", "b_outcomeAL",  "sd_author__outcomeAL", "sd_author__outcomePD",
                                        "cor_author__outcomeAL__outcomePD"))

sigma_a = samps$sd_author__outcomeAL
sigma_b = samps$sd_author__outcomePD
rho = samps$cor_author__outcomeAL__outcomePD

set.seed(214321)
Sigmas <- list()
preds <- matrix(NA, nrow(samps), 2)
for(i in 1:nrow(samps)){
  tmp_sd <- diag(c(sigma_a[i], sigma_b[i]))
  tmp_cor <- matrix(rho[i], 2, 2)
  diag(tmp_cor) <- 1
  Sigmas[[i]] <- tmp_sd %*% tmp_cor %*% tmp_sd
  preds[i,] <- rmvnorm(1, mean = c(samps$b_outcomeAL[i], samps$b_outcomePD[i]), sigma = Sigmas[[i]])
  }

EDIT: Set the seed.

arthur-albuquerque · February 3, 2022, 1:13am

Thank you very much, Stephen! Your code makes complete sense to me, and I cannot find any error. However, the predictive output is a little bit different from what I was expecting.

As shown below, the predictive draws for one of the outcome (AL) is wider (as expected), but its mean is quite different and there are very few draws close to 0. Not sure why.

Code:

# Posterior predictive

library(mvtnorm)

draws = tidybayes::tidy_draws(m2)
sigma_a = draws$sd_author__outcomeAL
sigma_b = draws$sd_author__outcomePD
rho = draws$cor_author__outcomeAL__outcomePD

Sigmas <- list()
preds <- matrix(NA, nrow(draws), 2)

for(i in 1:nrow(draws)){
  
  set.seed(123)
  
  tmp_sd <- diag(c(sigma_a[i], sigma_b[i]))
  
  tmp_cor <- matrix(rho[i], 2, 2)
  
  diag(tmp_cor) <- 1
  
  Sigmas[[i]] <- tmp_sd %*% tmp_cor %*% tmp_sd
  
  preds[i,] <- mvtnorm::rmvnorm(
    1,
    mean = c(draws$b_outcomeAL[i], draws$b_outcomePD[i]),
    sigma = Sigmas[[i]]
    )
  
}

preds = data.frame(preds)
colnames(preds) = c("b_outcomeAL", "b_outcomePD")

data.frame("Marginal Posterior Draws" = draws$b_outcomeAL,
           "Posterior Predictive" = preds$b_outcomeAL) |> 
  tidyr::pivot_longer(1:2) |> 
  ggplot() +
  aes(x = value, y = name, fill = name) +
  ggdist::stat_halfeye() +
  labs(y = NULL,
       title = "b_outcomeAL") +
  theme(legend.position = "none")

Stephen_Wild · February 3, 2022, 1:30am

I’m not sure what’s going on, but here is the full code I used:

library(tidyverse)
library(tidybayes)
library(cmdstanr)
library(brms)
library(metafor)


# Load dataset with two correlated outcomes
dat = metadat::dat.berkey1998

### construct variance-covariance matrix assuming within-study correlation = 0.5
V = vcalc(vi, cluster=author, type=outcome, rho=0.5, data=dat)

# Specify formula
mf2 = 
  brms::bf(yi ~ 0 + outcome + (0 + outcome|author) + fcor(V))

# Fit model
m2 = brms::brm(
  formula = mf2,
  prior = 
    prior(constant(1), class = "sigma"),
  data = dat,
  data2 = list(V = V),
  family = gaussian,
  
  warmup = 5000, iter = 10000,
  chains = 4,
  seed = 123,
  backend = "cmdstanr"
)

## see ?brms::fcor for further details on formula syntax and data2 argument

## see https://bit.ly/34tHcUU for explananations on the "constant(1) prior"
## and formula syntax

m2

nd = data.frame(author = "new",
                outcome = c("AL", "PD"))
ses <- sqrt(c(0.0075, 0.003))
rho <- matrix(0.6, 2, 2)
diag(rho) <- 1
new_vcov <- diag(ses) %*% rho %*% diag(ses)
V2 <- diag(0.0001, 2)

brms::posterior_predict(object = m2,
                        newdata = nd,
                        re_formula = NULL,
                        allow_new_levels = TRUE,
                        sample_new_levels = "gaussian",
                        newdata2 = list(V = V2))


library(mvtnorm)

samps <- posterior_samples(m2, pars = c("b_outcomePD", "b_outcomeAL",
                                        "sd_author__outcomeAL", "sd_author__outcomePD",
                                        "cor_author__outcomeAL__outcomePD"))

sigma_a = samps$sd_author__outcomeAL
sigma_b = samps$sd_author__outcomePD
rho = samps$cor_author__outcomeAL__outcomePD

set.seed(214321)
Sigmas <- list()
preds <- matrix(NA, nrow(samps), 2)
for(i in 1:nrow(samps)){
  tmp_sd <- diag(c(sigma_a[i], sigma_b[i]))
  tmp_cor <- matrix(rho[i], 2, 2)
  diag(tmp_cor) <- 1
  Sigmas[[i]] <- tmp_sd %*% tmp_cor %*% tmp_sd
  preds[i,] <- rmvnorm(1, mean = c(samps$b_outcomeAL[i], samps$b_outcomePD[i]),
                sigma = Sigmas[[i]])
  
}

colnames(preds) = c("b_outcomeAL", "b_outcomePD")

df <- data.frame(value = c(samps[,2], preds[,1], samps[,1], preds[,2]),
                 name = rep(c("outcomeAL", "outcomePD"), each = 20000 * 2),
                 group = rep(c("samps", "preds", "samps", "preds"), each = 20000))

df %>% ggplot() +
  aes(x = value, y = name, colour = group, group = group, fill = group) +
  stat_halfeye(alpha = 0.5)

It gives me this image:

Best I can suggest at the moment is restart R from a fresh session.

arthur-albuquerque · February 3, 2022, 1:43am

Dear Stephan, I am now able to reproduce your output. It seems that putting the set.seed() inside the loop (as I was doing) messes everything up.

Either way, your graph now seems very reasonable, and most likely is correct. Hopefully someone more experienced than me could chip in to further confirm it.

Thanks!!

Here is my full code inspired by yours, but using tidybayes and only plotting b_outcomeAL.

# Install/load packages
pacman::p_load(brms,
               tidybayes,
               tidyverse,
               mvtnorm)

pacman::p_load_gh("stan/cmdstanr",
                  "wviechtb/metafor") # for {metadat} and vcor()

# Load dataset with two correlated outcomes
dat = metadat::dat.berkey1998

### construct variance-covariance matrix assuming within-study correlation = 0.5
V = metafor::vcalc(vi, cluster=author, type=outcome, rho=0.5, data=dat)

# Specify formula
mf2 = 
  brms::bf(yi ~ 0 + outcome + (0 + outcome|author) + fcor(V))

# Fit model
m2 = brms::brm(
  formula = mf2,
  prior = 
    prior(constant(1), class = "sigma"),
  data = dat,
  data2 = list(V = V),
  family = gaussian,
  
  #warmup = 2000, iter = 5000,
  chains = 4,
  cores = 4,
  seed = 123,
  backend = "cmdstanr"
)

## see ?brms::fcor for further details on formula synthax and data2 argument

## see https://bit.ly/34tHcUU for explanations on the "constant(1) prior"
## and formula synthax

# Posterior predictive

draws = tidybayes::tidy_draws(m2)

sigma_a = draws$sd_author__outcomeAL
sigma_b = draws$sd_author__outcomePD
rho = draws$cor_author__outcomeAL__outcomePD


Sigmas = list()
preds = matrix(NA, nrow(draws), 2)

set.seed(214321)

for(i in 1:nrow(draws)){
  
  tmp_sd = diag(c(sigma_a[i], sigma_b[i]))
  
  tmp_cor = matrix(rho[i], 2, 2)
  
  diag(tmp_cor) = 1
  
  Sigmas[[i]] = tmp_sd %*% tmp_cor %*% tmp_sd
  
  preds[i,] = rmvnorm(1, mean = c(draws$b_outcomeAL[i], draws$b_outcomePD[i]),
                       sigma = Sigmas[[i]])
  
}

preds = data.frame(preds)
colnames(preds) = c("b_outcomeAL", "b_outcomePD")

data.frame("Marginal Posterior Draws" = draws$b_outcomeAL,
           "Posterior Predictive" = preds$b_outcomeAL) |> 
  tidyr::pivot_longer(1:2) |> 
  ggplot() +
  aes(x = value, y = name, fill = name) +
  ggdist::stat_halfeye() +
  labs(y = NULL,
       title = "b_outcomeAL") +
  theme(legend.position = "none")

mjskay · February 3, 2022, 3:53am

Saw this via twitter and I was wondering if posterior::rvar() might make it a bit easier to do. Here’s a translation of your loop-based solution into an rvar-based one (I verified the solutions are identical given the same seed):

# Install/load packages
pacman::p_load(brms,
               posterior,
               tidyverse,
               mvtnorm)

pacman::p_load_gh("stan/cmdstanr",
                  "wviechtb/metafor") # for {metadat} and vcor()

# Load dataset with two correlated outcomes
dat = metadat::dat.berkey1998

### construct variance-covariance matrix assuming within-study correlation = 0.5
V = metafor::vcalc(vi, cluster=author, type=outcome, rho=0.5, data=dat)

# Specify formula
mf2 = 
  brms::bf(yi ~ 0 + outcome + (0 + outcome|author) + fcor(V))

# Fit model
m2 = brms::brm(
  formula = mf2,
  prior = 
    prior(constant(1), class = "sigma"),
  data = dat,
  data2 = list(V = V),
  family = gaussian,
  
  #warmup = 2000, iter = 5000,
  chains = 4,
  cores = 4,
  seed = 123,
  backend = "cmdstanr"
)


# Posterior predictive

set.seed(214321)

draws = posterior::as_draws_rvars(m2)

# not all usages of diag() are implemented with rvar yet
# (it's on my list...) so we have to do it manually
sd = c(draws$sd_author__outcomeAL, 0, 0, draws$sd_author__outcomePD)
dim(sd) = c(2, 2)

rho = draws$cor_author__outcomeAL__outcomePD
cor = c(rvar(1), rho, rho, rvar(1))
dim(cor) = c(2, 2)

# matrix multiplication in rvars is named `%**%` because the
# base-R `%*%` operator has a baroque implementation that
# doesn't play nice with S3 generics
Sigma = sd %**% cor %**% sd

mu = c(draws$b_outcomeAL, draws$b_outcomePD)

# rdo() is a bit "magic" in that it detects rvars in the given
# expression and more or less creates the loop you wrote
# manually and then packages the results back up into an rvar
preds = rdo(rmvnorm(1, mu, Sigma))

data.frame(
    "Outcome" = c("AL", "PD"),
    "Marginal Posterior Draws" = c(draws$b_outcomeAL, draws$b_outcomePD),
    "Posterior Predictive" = c(preds)
  ) |>
  tidyr::pivot_longer(2:3) |> 
  ggplot() +
  aes(dist = value, y = Outcome, fill = name) +
  ggdist::stat_dist_slab(alpha = 0.5) +
  ggdist::stat_dist_pointinterval(
    aes(color = name), 
    position = position_dodge(width = 0.4, preserve = "single")
  ) +
  scale_color_brewer(palette = "Dark2", aesthetics = c("fill", "color"))

The resulting preds object is an rvar which is a row vector:

preds
## rvar<1000,4>[1,2] mean ± sd:
##      [,1]          [,2]         
## [1,] -0.36 ± 0.35   0.35 ± 0.24

Internally it is backed by an array whose first dimension is the draws:

str(draws_of(preds))
## num [1:4000, 1, 1:2] -0.101 -0.563 -0.369 -0.422 -0.59 ...
## - attr(*, "dimnames")=List of 3
##  ..$ : chr [1:4000] "1" "2" "3" "4" ...
##  ..$ : NULL
##  ..$ : NULL

You can put it into data frames, use most math operators / functions / etc, as if it were a normal R array (for the most part). This is what allows us to put the rvars into a data frame and then map them onto the dist aesthetic, which ggdist::stat_dist_... geoms understand.

In the next release of ggdist (coming very soon), this syntax will be a bit simpler. While currently you have to do something like this:

ggplot() +
  aes(dist = value, y = Outcome) +
  ggdist::stat_dist_slab()

In the future dist can be more explicitly xdist or ydist (instead of the x/y axis being inferred) and the stat_ and stat_dist families will be merged, so you will be able to write something like:

ggplot() +
  aes(xdist = value, y = Outcome) +
  ggdist::stat_slab()

arthur-albuquerque · February 3, 2022, 8:53pm

Hi Matthew, this is absolutely amazing. I had never used {posterior} before. Thank you very much.

Here is an extension of your code to plot the bivariate posterior predictive distribution, in case anyone is interested. Clean code thanks to {posterior}!

# Load 1 extra package for plot
pacman::p_load_gh("jamesotto852/ggdensity")

set.seed(214321)

preds = 
  posterior::rdo(mvtnorm::rmvnorm(1, mu, Sigma)) |> 
  posterior::as_draws_df() |> 
  posterior::rename_variables(AL = `x[1,1]`,
                              PD = `x[1,2]`)

preds |> 
  ggplot() +
  aes(x = AL, PD) +
  ggdensity::geom_hdr(
    aes(fill = after_stat(level)), 
    alpha = 1
  ) +
  scale_fill_manual(values=MetBrewer::met.brewer("OKeeffe2", 4)) +
  geom_hline(yintercept = 0, linetype = 2) +
  geom_vline(xintercept = 0, linetype = 2) +
  coord_cartesian(x = c(-1.2, 1.2), y = c(-1.2, 1.2)) +
  labs(title = "Bivariate Posterior Predictive Distribution\n") +
  ggdist::theme_ggdist() +
  theme(plot.title.position = "plot")

mjskay · February 4, 2022, 3:19am

Nice!!

Topic		Replies	Views
Bivariate model for a meta analysis of diagnostic test accuracy Modeling specification , covariance , meta-analysis	8	1496	February 1, 2024
Multivariate multilevel distributional model + random correlations brms	10	2091	February 10, 2020
Computing a 95% prediction interval from brms model brms brms	3	703	November 18, 2023
Prior predictive check multivariate model brms	7	2570	August 5, 2020
Multivariate model interpreting residual correlation and group correlation brms interpret-results	14	3884	September 12, 2020

Posterior Predictive Distribution in Bivariate Meta-Analysis

Related topics