Dear all
I would like to build a Bayesian random-effects meta-analysis model with selection models to take into account the publication bias. The example data is originally from RoBMA package. The preprocessed data can be download here: ExampleData.csv (1.8 KB).
First, I use library(brms)
to create the meta-analysis model without the selection models:
# R code to read the data
library(tidyverse)
library(brms)
library(rstan)
df <- read_csv("ExampleData.csv")
# build two-level random-effects meta-analysis with brms
brmfit <- brm(d | se(se) ~ 1 + (1|name), data = df,
chains = 4, cores = 4)
# obtain the corresponding stan code
stancode(brmfit)
Then, I tried to add the selection models (publication bias) components to the stan code obtained from brmfit
. (The way I implemented selection models is based on my understanding of library(RoBMA)
source codes). In this example, the assumption is that experiments whose p-value was larger than 0.05 were published with a probability of omega_1
, whereas all experiments whose p-value was smaller than 0.05 were published (i.e., with the probability of omega_2
= 1). Therefore, when calculating the likelihood of observing an experiment effect size, the weighted normal distribution (i.e., the normal distribution with different probabilities for different intervals) should be used. The formula should be:g(x|\mu) = \frac{f(x|\mu)w(x)}{A}, where f(x|\mu) and g(x|\mu) are the PDF of x before and after publication selections, respectively. w(x) is the probability to publish the paper (based on p-value). A is a normalizing factor to make sure g(x|\mu) is a proper PDF.
The full stan code is as following (the parts I changed are marked as “bias-related”):
// generated with brms 2.15.0
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
vector<lower=0>[N] se; // known sampling error
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
int prior_only; // should the likelihood be ignored?
// (bias-related)
int<lower=1> I[N]; // index for intervals based on p-value
}
transformed data {
vector<lower=0>[N] se2 = square(se);
}
parameters {
real Intercept; // temporary intercept for centered predictors
vector<lower=0>[M_1] sd_1; // group-level standard deviations
vector[N_1] z_1[M_1]; // standardized group-level effects
// (bias-related)
simplex[2] theta; //
}
transformed parameters {
real<lower=0> sigma = 0; // residual SD
vector[2] omega; // (bias-related) publication bias
vector[N_1] r_1_1; // actual group-level effects
r_1_1 = (sd_1[1] * (z_1[1]));
// (bias-related) calculate omega based on theta from dirichlet
omega = cumulative_sum(theta);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = Intercept + rep_vector(0.0, N);
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
// (bias-related)
target += normal_lpdf(Y[n] | mu[n], se[n]);
target += log(omega[I[n]]);
// denominators
target += - log_sum_exp(
normal_lcdf(1.96 | mu[n], se[n]) + log(omega[1]),
normal_lccdf(1.96 | mu[n], se[n]) + log(omega[2])
);
}
// target += normal_lpdf(Y | mu, se);
}
// priors including constants
target += student_t_lpdf(Intercept | 3, 0.5, 2.5);
target += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
target += std_normal_lpdf(z_1[1]);
// (bias-related)
target += dirichlet_lpdf(theta | rep_vector(2, 2));
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept;
}
# r code to fit above stan code
data_ls <- standata(brmfit) # obtain data list from brmfit
data_ls$I <- df$I # which interval the study is at based on p-value (i.e., whether larger or smaller than 0.05)
stanfit_bias <- stan(file = 'stan_bias.stan',
data = data_ls,
chains = 4, cores = 4)
With the above stan code, the chains were not mixed.
I guess there is something wrong with the part of calculating the (log) likelihood of observing an effect:
...
// (bias-related)
target += normal_lpdf(Y[n] | mu[n], se[n]);
target += log(omega[I[n]]);
// denominators
target += - log_sum_exp(
normal_lcdf(1.96 | mu[n], se[n]) + log(omega[1]),
normal_lccdf(1.96 | mu[n], se[n]) + log(omega[K])
);
...
The first target +=...
is equivalent to the likelihood for one experiment without taking into account publication bias (probability). The second target +=...
is the log of publication probability for this experiment. The third target +=...
is calculating the denominators to make sure the first two target +=...
are giving the proper likelihood. It’s very likely I made mistakes here, but don’t know what exactly it is.
When I commented the “denominators” (i.e., the third target +=...
), the model can converge. But I don’t think the estimates are correct.
Please let me know where the mistakes are and what I can try instead. (I just started to learn programming with Stan, so I may make any mistake, including very silly ones).
Thanks a lot in advance,
Haiyang