Bayesian random-effects meta-analysis with selection models (publication bias)

Dear all

I would like to build a Bayesian random-effects meta-analysis model with selection models to take into account the publication bias. The example data is originally from RoBMA package. The preprocessed data can be download here: ExampleData.csv (1.8 KB).

First, I use library(brms) to create the meta-analysis model without the selection models:

# R code to read the data
library(tidyverse)
library(brms)
library(rstan)

df <- read_csv("ExampleData.csv")

# build two-level random-effects meta-analysis with brms
brmfit <- brm(d | se(se) ~ 1 + (1|name), data = df,
              chains = 4, cores = 4)

# obtain the corresponding stan code
stancode(brmfit)

Then, I tried to add the selection models (publication bias) components to the stan code obtained from brmfit. (The way I implemented selection models is based on my understanding of library(RoBMA) source codes). In this example, the assumption is that experiments whose p-value was larger than 0.05 were published with a probability of omega_1, whereas all experiments whose p-value was smaller than 0.05 were published (i.e., with the probability of omega_2 = 1). Therefore, when calculating the likelihood of observing an experiment effect size, the weighted normal distribution (i.e., the normal distribution with different probabilities for different intervals) should be used. The formula should be:g(x|\mu) = \frac{f(x|\mu)w(x)}{A}, where f(x|\mu) and g(x|\mu) are the PDF of x before and after publication selections, respectively. w(x) is the probability to publish the paper (based on p-value). A is a normalizing factor to make sure g(x|\mu) is a proper PDF.

The full stan code is as following (the parts I changed are marked as “bias-related”):

// generated with brms 2.15.0
functions {
}
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  vector<lower=0>[N] se;  // known sampling error
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  int prior_only;  // should the likelihood be ignored?
  // (bias-related) 
  int<lower=1> I[N]; // index for intervals based on p-value
}
transformed data {
  vector<lower=0>[N] se2 = square(se);
}
parameters {
  real Intercept;  // temporary intercept for centered predictors
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // standardized group-level effects
  // (bias-related)
  simplex[2] theta; // 
}
transformed parameters {
  real<lower=0> sigma = 0;  // residual SD
  vector[2] omega;  // (bias-related) publication bias
  vector[N_1] r_1_1;  // actual group-level effects
  r_1_1 = (sd_1[1] * (z_1[1]));
  // (bias-related) calculate omega based on theta from dirichlet
  omega = cumulative_sum(theta);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = Intercept + rep_vector(0.0, N);
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
      // (bias-related)
      target += normal_lpdf(Y[n] | mu[n], se[n]);
      target += log(omega[I[n]]);
      // denominators
      target += - log_sum_exp(
        normal_lcdf(1.96 | mu[n], se[n]) + log(omega[1]),
        normal_lccdf(1.96 | mu[n], se[n]) + log(omega[2])
        );
    }
    // target += normal_lpdf(Y | mu, se);
  }
  // priors including constants
  target += student_t_lpdf(Intercept | 3, 0.5, 2.5);
  target += student_t_lpdf(sd_1 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  target += std_normal_lpdf(z_1[1]);
  // (bias-related) 
  target += dirichlet_lpdf(theta | rep_vector(2, 2));
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept;
}

# r code to fit above stan code
data_ls <- standata(brmfit)  # obtain data list from brmfit
data_ls$I <- df$I  # which interval the study is at based on p-value (i.e., whether larger or smaller than 0.05)

stanfit_bias <- stan(file = 'stan_bias.stan',
                     data = data_ls,
                     chains = 4, cores = 4)

With the above stan code, the chains were not mixed.

I guess there is something wrong with the part of calculating the (log) likelihood of observing an effect:

...
 // (bias-related)
      target += normal_lpdf(Y[n] | mu[n], se[n]);
      target += log(omega[I[n]]);
      // denominators
      target += - log_sum_exp(
        normal_lcdf(1.96 | mu[n], se[n]) + log(omega[1]),
        normal_lccdf(1.96 | mu[n], se[n]) + log(omega[K])
        );
...

The first target +=... is equivalent to the likelihood for one experiment without taking into account publication bias (probability). The second target +=... is the log of publication probability for this experiment. The third target +=... is calculating the denominators to make sure the first two target +=... are giving the proper likelihood. It’s very likely I made mistakes here, but don’t know what exactly it is.

When I commented the “denominators” (i.e., the third target +=...), the model can converge. But I don’t think the estimates are correct.

Please let me know where the mistakes are and what I can try instead. (I just started to learn programming with Stan, so I may make any mistake, including very silly ones).

Thanks a lot in advance,
Haiyang

I think you should be using theta instead of omega. Since theta is a simplex, it’s probabilities.

There’s a bit of a weirdness in Stan right now where normal_lcdf is more stable to compute than normal_lccdf. So you can try:

real l1 = normal_lcdf(1.96 | mu[n], se[n]);
real l2 = log1m_exp(l1);
target += - log_sum_exp(l1 + log(theta[1]), l2 + log(theta[2]));

There is also a log_mix function (doc’d here) that simplifies this to:

target += -log_mix(theta[1], l1, l2);

So try that stuff first, if that goes nowhere, then it’s simulated data time. Simulate data from your model where you know the answer and then fit it back and see what results you’re getting. If you’re getting the wrong answer then, then we know more clearly there’s probably a bug somewhere. Since you’ll know the correct answer since you determined it, you can sequentially work through the model holding different portions to known correct constants to find the problem.

2 Likes

Many thanks for the suggestion!

Here I use omega instead of theta is because I would like to make sure the second (last) probability is always 1 (Here omega = cumulative_sum(theta); I agree it is a bit weird here but it will make more sense when there are more than two values in theta).

With your suggestion and the Stan code from library(publipha), I updated the code for calculating the denominator as:

...
real l1;
real l2;
... 
      // (bias-related)
      target += normal_lpdf(Y[n] | mu[n], se[n]);
      target += log(omega[I[n]]);
      // denominators
      // target += - log_sum_exp(
      //  normal_lcdf(1.96 | mu[n], se[n]) + log(omega[1]),
      //  normal_lccdf(1.96 | mu[n], se[n]) + log(omega[2])
      //  );
      l1 = normal_lcdf(1.96 * se[n] | Intercept, sqrt(se[n]*se[n] + sd_1[1]*sd_1[1]));
      l2 = log1m_exp(l1);
      target += - log_sum_exp(l1 + log(omega[1]), l2 + log(omega[2]));
...

Now the model converges and the results look reasonable. It’s an example dataset and I don’t know the “true” values. I will simulate some data and check the results again.

BTW, I also tried to calculate l2 with normal_lccdf, the result was similar to that when using log1m_exp(l1). More specifically:

l2 = normal_lccdf(1.96 * se[n] | Intercept, sqrt(se[n]*se[n] + sd_1[1]*sd_1[1]));

Many thanks for your help!

Nice! Glad it’s working.

But if this is w(x), doesn’t it need to be a probability? To w[1] and w[2] need to sum to 1.0, so if w[2] is 1.0 this doesn’t work?

Yes, w(x) needs to be a probability, but they don’t need (should not) to sum to 1.0.

The assumption applied here is that when the p-value was smaller than 0.05, all the experiments will be published (that is to say the probability of publishing here is 100%, i.e., w[2]). When one experiment’s p-value was larger than 0.05, the probability of publishing was a fixed probability, which should be smaller than 100% w[1]. Their sum is not 1. Hope it makes sense.

I guess the probabilities that one experiment’s p-value was larger or smaller than 0.05 sum to 1.0, but it is not w(x) here.

Just another update. It seems that I found the bug in the original code:

...
      // (bias-related)
      target += normal_lpdf(Y[n] | mu[n], se[n]);
      target += log(omega[I[n]]);
      // denominators
      target += - log_sum_exp(
      normal_lcdf(1.96 * se[n] | mu[n], se[n]) + log(omega[1]),
      normal_lccdf(1.96 * se[n] | mu[n], se[n]) + log(omega[2])
      );
...

Once I use 1.96 * se[n] (instead of 1.96; I used the wrong value to calculate the CDF), the model can converge now. The results are similar to my last answer. I’m not sure if I fully understand the differences, but it seems that they are equivalent (this and the last answer).

2 Likes

BTW, I have uploaded the Stan codes I tried to implement selection models with multilevel models to github. Feel free to contact me for anything.

1 Like