Hi everyone,
I am trying to code a relatively simple model to detect publication bias. The model does not seem to have any issue when running (no error messages, R_hat =1,…), however, when I test it on a simulated dataset it does not seem to detect a very simple publication bias process.
This is the very simple model I use, it is a hierarchical random effect model, where the true effects come from a student-t distribution. The publication bias is a function of the t-statistic, where papers with a t-statistic less than 1.95 have a publication probability less than one. I simulated a dataset from the same model (code below). To my surprise, the model does NOT detect the publication bias, i.e. \delta \approx 1 and no change in the estimate of \beta from the same model without the publication bias. I am not sure what I am doing wrong, any advice?
data {
int<lower=0> N; // num observations (published studies)
int<lower=0> K; // num of covariates
vector[N] y; // observed outcome (published)
vector[N] se; // vector of standard error estimates (published)
matrix[N, K] X;
}
parameters {
vector[N] theta; // latent true effect sizes
real<lower=0, upper=1> sigma_unif; // Uniform variable
vector[K] beta; // coefficients for covariates
real<lower=0, upper=1> delta; // effect of publication bias (strength of bias)
}
transformed parameters {
real<lower=0> sigma_theta;
sigma_theta = 2 * tan(sigma_unif * pi() / 2); // Transform sigma_unif to half-Cauchy
}
model {
// Likelihood for observed (published) outcomes
for (i in 1:N) {
// Model the observed data conditional on publication
target += normal_lpdf(y[i] | theta[i], se[i]); // Likelihood of the observed data
// Add the publication bias adjustment to the target
if (y[i] / se[i] < 1.96) {
target += log(delta); // Probability of publication when t-stat is below -1.96
}
}
// 2nd hierarchical layer
theta ~ normal(X*beta, sigma_theta);
// 3rd hierarchical layer
beta ~ normal(0, 2);
sigma_unif ~ uniform(0, 1);
// Prior on publication bias effect
delta ~ beta(3, 7); // Prior for delta
}
This is how I simulate the dataset:
# Step 1: Define parameters
N_total <- 120 # Total studies before publication bias
beta_A <- 0.25 # True effect for A studies
beta_B <- 0.10 # True effect for B studies
tau <- 0.1 # Between-study variance
df_t <- 5 # Degrees of freedom for Student's t (heavier tails if lower)
# Step 2: Assign covariates randomly
covariate <- sample(c("A", "B"), size = N_total, replace = TRUE)
# Create indicator variables
I_A <- ifelse(covariate == "A", 1, 0)
I_B <- ifelse(covariate == "B", 1, 0)
# Step 3: Simulate true treatment effects from a Student-t distribution
theta <- beta_A * I_A + beta_B * I_B + rt(N_total, df = df_t) * tau
# Step 4: Simulate standard errors
SE <- runif(N_total, min = 0.05, max = 0.2) # Study precision varies
# Step 5: Generate observed effects
treatment_effect <- theta + rnorm(N_total, mean = 0, sd = SE)
# Step 6: Compute t-statistics
t_stat <- treatment_effect / SE
# Step 7: Apply publication bias: studies with |t| < 1.95 are less likely to be published
publication_prob <- ifelse(abs(t_stat) < 1.95, 0.3, 1) # Lower prob if t-stat is small
selected <- sample(1:N_total, size = 90, prob = publication_prob, replace = FALSE)
# Step 8: Create final dataset
dt <- data.table(
treatment_effect = treatment_effect[selected],
standard_error = SE[selected],
covariate = covariate[selected]
)
dt[, A := ifelse(covariate == "A", 1, 0)]
dt[, B := ifelse(covariate == "B", 1, 0)]