Bayesian random-effects meta-analysis with selection models (publication bias)

HaiyangJin · March 22, 2021, 12:31pm

Dear all

I would like to build a Bayesian random-effects meta-analysis model with selection models to take into account the publication bias. The example data is originally from RoBMA package. The preprocessed data can be download here: ExampleData.csv (1.8 KB).

First, I use library(brms) to create the meta-analysis model without the selection models:

# R code to read the data
library(tidyverse)
library(brms)
library(rstan)

df <- read_csv("ExampleData.csv")

# build two-level random-effects meta-analysis with brms
brmfit <- brm(d | se(se) ~ 1 + (1|name), data = df,
              chains = 4, cores = 4)

# obtain the corresponding stan code
stancode(brmfit)

Then, I tried to add the selection models (publication bias) components to the stan code obtained from brmfit. (The way I implemented selection models is based on my understanding of library(RoBMA) source codes). In this example, the assumption is that experiments whose p-value was larger than 0.05 were published with a probability of omega_1, whereas all experiments whose p-value was smaller than 0.05 were published (i.e., with the probability of omega_2 = 1). Therefore, when calculating the likelihood of observing an experiment effect size, the weighted normal distribution (i.e., the normal distribution with different probabilities for different intervals) should be used. The formula should be:g(x|\mu) = \frac{f(x|\mu)w(x)}{A}, where f(x|\mu) and g(x|\mu) are the PDF of x before and after publication selections, respectively. w(x) is the probability to publish the paper (based on p-value). A is a normalizing factor to make sure g(x|\mu) is a proper PDF.

The full stan code is as following (the parts I changed are marked as “bias-related”):

// generated with brms 2.15.0
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  vector<lower=0>[N] se;  // known sampling error
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  int prior_only;  // should the likelihood be ignored?
  // (bias-related) 
  int<lower=1> I[N]; // index for intervals based on p-value
}
transformed data {
  vector<lower=0>[N] se2 = square(se);
}
parameters {
  real Intercept;  // temporary intercept for centered predictors
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // standardized group-level effects
  // (bias-related)
  simplex[2] theta; // 
}
transformed parameters {
  real<lower=0> sigma = 0;  // residual SD
  vector[2] omega;  // (bias-related) publication bias
  vector[N_1] r_1_1;  // actual group-level effects
  r_1_1 = (sd_1[1] * (z_1[1]));
  // (bias-related) calculate omega based on theta from dirichlet
  omega = cumulative_sum(theta);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = Intercept + rep_vector(0.0, N);
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
      // (bias-related)
      target += normal_lpdf(Y[n] | mu[n], se[n]);
      target += log(omega[I[n]]);
      // denominators
      target += - log_sum_exp(
        normal_lcdf(1.96 | mu[n], se[n]) + log(omega[1]),
        normal_lccdf(1.96 | mu[n], se[n]) + log(omega[2])
        );
    }
    // target += normal_lpdf(Y | mu, se);
  }
  // priors including constants
  target += student_t_lpdf(Intercept | 3, 0.5, 2.5);
  target += student_t_lpdf(sd_1 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  target += std_normal_lpdf(z_1[1]);
  // (bias-related) 
  target += dirichlet_lpdf(theta | rep_vector(2, 2));
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept;
}

# r code to fit above stan code
data_ls <- standata(brmfit)  # obtain data list from brmfit
data_ls$I <- df$I  # which interval the study is at based on p-value (i.e., whether larger or smaller than 0.05)

stanfit_bias <- stan(file = 'stan_bias.stan',
                     data = data_ls,
                     chains = 4, cores = 4)

With the above stan code, the chains were not mixed.

I guess there is something wrong with the part of calculating the (log) likelihood of observing an effect:

...
 // (bias-related)
      target += normal_lpdf(Y[n] | mu[n], se[n]);
      target += log(omega[I[n]]);
      // denominators
      target += - log_sum_exp(
        normal_lcdf(1.96 | mu[n], se[n]) + log(omega[1]),
        normal_lccdf(1.96 | mu[n], se[n]) + log(omega[K])
        );
...

The first target +=... is equivalent to the likelihood for one experiment without taking into account publication bias (probability). The second target +=... is the log of publication probability for this experiment. The third target +=... is calculating the denominators to make sure the first two target +=... are giving the proper likelihood. It’s very likely I made mistakes here, but don’t know what exactly it is.

When I commented the “denominators” (i.e., the third target +=...), the model can converge. But I don’t think the estimates are correct.

Please let me know where the mistakes are and what I can try instead. (I just started to learn programming with Stan, so I may make any mistake, including very silly ones).

Thanks a lot in advance,
Haiyang

bbbales2 · March 24, 2021, 1:06pm

I think you should be using theta instead of omega. Since theta is a simplex, it’s probabilities.

There’s a bit of a weirdness in Stan right now where normal_lcdf is more stable to compute than normal_lccdf. So you can try:

real l1 = normal_lcdf(1.96 | mu[n], se[n]);
real l2 = log1m_exp(l1);
target += - log_sum_exp(l1 + log(theta[1]), l2 + log(theta[2]));

There is also a log_mix function (doc’d here) that simplifies this to:

target += -log_mix(theta[1], l1, l2);

So try that stuff first, if that goes nowhere, then it’s simulated data time. Simulate data from your model where you know the answer and then fit it back and see what results you’re getting. If you’re getting the wrong answer then, then we know more clearly there’s probably a bug somewhere. Since you’ll know the correct answer since you determined it, you can sequentially work through the model holding different portions to known correct constants to find the problem.

HaiyangJin · March 24, 2021, 5:28pm

Many thanks for the suggestion!

Here I use omega instead of theta is because I would like to make sure the second (last) probability is always 1 (Here omega = cumulative_sum(theta); I agree it is a bit weird here but it will make more sense when there are more than two values in theta).

With your suggestion and the Stan code from library(publipha), I updated the code for calculating the denominator as:

...
real l1;
real l2;
... 
      // (bias-related)
      target += normal_lpdf(Y[n] | mu[n], se[n]);
      target += log(omega[I[n]]);
      // denominators
      // target += - log_sum_exp(
      //  normal_lcdf(1.96 | mu[n], se[n]) + log(omega[1]),
      //  normal_lccdf(1.96 | mu[n], se[n]) + log(omega[2])
      //  );
      l1 = normal_lcdf(1.96 * se[n] | Intercept, sqrt(se[n]*se[n] + sd_1[1]*sd_1[1]));
      l2 = log1m_exp(l1);
      target += - log_sum_exp(l1 + log(omega[1]), l2 + log(omega[2]));
...

Now the model converges and the results look reasonable. It’s an example dataset and I don’t know the “true” values. I will simulate some data and check the results again.

BTW, I also tried to calculate l2 with normal_lccdf, the result was similar to that when using log1m_exp(l1). More specifically:

l2 = normal_lccdf(1.96 * se[n] | Intercept, sqrt(se[n]*se[n] + sd_1[1]*sd_1[1]));

Many thanks for your help!

bbbales2 · March 24, 2021, 9:42pm

Nice! Glad it’s working.

But if this is w(x), doesn’t it need to be a probability? To w[1] and w[2] need to sum to 1.0, so if w[2] is 1.0 this doesn’t work?

HaiyangJin · March 25, 2021, 5:07am

Yes, w(x) needs to be a probability, but they don’t need (should not) to sum to 1.0.

The assumption applied here is that when the p-value was smaller than 0.05, all the experiments will be published (that is to say the probability of publishing here is 100%, i.e., w[2]). When one experiment’s p-value was larger than 0.05, the probability of publishing was a fixed probability, which should be smaller than 100% w[1]. Their sum is not 1. Hope it makes sense.

I guess the probabilities that one experiment’s p-value was larger or smaller than 0.05 sum to 1.0, but it is not w(x) here.

Just another update. It seems that I found the bug in the original code:

...
      // (bias-related)
      target += normal_lpdf(Y[n] | mu[n], se[n]);
      target += log(omega[I[n]]);
      // denominators
      target += - log_sum_exp(
      normal_lcdf(1.96 * se[n] | mu[n], se[n]) + log(omega[1]),
      normal_lccdf(1.96 * se[n] | mu[n], se[n]) + log(omega[2])
      );
...

Once I use 1.96 * se[n] (instead of 1.96; I used the wrong value to calculate the CDF), the model can converge now. The results are similar to my last answer. I’m not sure if I fully understand the differences, but it seems that they are equivalent (this and the last answer).

HaiyangJin · April 8, 2021, 6:46am

BTW, I have uploaded the Stan codes I tried to implement selection models with multilevel models to github. Feel free to contact me for anything.

Topic		Replies	Views
Hierarchical Model for Publication Bias Modeling techniques , specification	22	369	March 21, 2025
Winner curse and publication bias Modeling rstan , techniques , specification	4	89	May 28, 2025
Heckman sample selection model (working but biased selection coefficients) Modeling	16	4107	October 22, 2017
Binomial-normal (meta-analysis) model in brms Modeling brms , meta-analysis	5	1641	February 11, 2022
Network meta-analysis random effect model for binary data Modeling meta-analysis	7	1691	June 23, 2020

Bayesian random-effects meta-analysis with selection models (publication bias)

Related topics