Generating data for meta-analysis

Hi STAN community! I am a total beginner to STAN trying to specify a model to generate data from. The idea is to generate data for a meta-analysis with specific effect and heterogeneity parameters that I can then reproduce using the rma function from the metafor package in R. Below is my model.

data{
int N; // number of studies
}

parameters{
real mu;
real <lower=.0001>tau;
}

model{
mu ~ normal(.5, .1);    // mean of true effects(d = .5)
tau ~ gamma(.5, 1);     // heterogeneity parameter(tau = .5)
}

generated quantities{
vector[N] theta;
vector[N] sigma;
vector[N] y;

for(i in 1:N){
theta[i] = normal_rng(mu, tau);  // drawn true population effect
sigma[i] = normal_rng(1, .1);   // population deviation

y[i] = normal_rng(theta[i], sigma[i]); // observed effects
}
}


My thinking is that this represent something like this:
\mu \sim N(0.5, 0.1)
\tau \sim \gamma(0.5, .1)

\theta_i \sim N(\mu, \tau)
\sigma_i \sim N(1, 1.1)

y_i \sim N(\theta_i, \sigma_i)

The issue is that when I simulate data from this model in R and try to estimate tau and mu the estimations are horribly off. For example, if I sample from this model for 5 studies with n = 300 like so:

n <- 300
iter <- 1000

set.seed(58)
sim_outcome <- sampling(model,
iter = iter,
warmup = iter - n,
chains = 1,
refresh = 0,
data = list(N = 5))

meta_data <- data.frame(extract(sim_outcome))
yi <- (unlist(lapply(meta_data[12:16],
FUN = mean)))

vi <- (unlist(lapply(meta_data[12:16],
FUN = var)))

rma(yi = yi, vi = vi, method = 'DL')


The rma results return a tau estimate of 0, which is confusing me. I have tried different variance estimators (e.g REML, ML) and also checked the model on big samples with many studies but it still does not work as I want (no heterogeneity). I can do this in base r without much issue, but I would really prefer to do it in STAN. Any guidance/help would be greatly appreciated!

Welcome, @tea_taul.

A couple of thoughts:

1. The sampler throws warning messages with your lower bound on tau of .0001. I get much better behavior with it set to zero, which is a typical way of bounding standard deviations. Iâm not sure why it does this. Maybe others can offer insight?
parameters{
real mu;
real <lower=0> tau;
}

1. Iâm confused why you take the mean and variance across your simulated draws.

(Note too thereâs a small indexing typo here. It should be 13:17, but this isnât the cause of the zero heterogeneity estimate.)
I think your Stan code is generating iter-n=700 datasets of 5 studies each. Each row of meta_data therefore has its own âtrueâ parameter values from which y.1 through y.5 are realized. Since each row has itâs own âtrueâ parameters, you are aggregating across this higher-level of heterogeneity that your rma model doesnât know about.

I may be missing something here, but I think you want something like this below. I think too that you need to square the sigma values that come out of your Stan simulation because vi is a variance:

# dataset 1
rma(yi = unlist(meta_data[1, 13:17]), vi = unlist(meta_data[1, 8:12])^2,
method = "DL")
meta_data[1, c("mu", "tau")]

# dataset 2
rma(yi = unlist(meta_data[2, 13:17]), vi = unlist(meta_data[2, 8:12])^2,
method = "DL")
meta_data[2, c("mu", "tau")]

# ... and so on ...


For each of these, I get what seem like reasonable estimates for mu and tau, but itâd be best to check this against a broader set of parameter values.

1 Like

Thanks the answer @wpetry! I think I have figured it out.

1. The reason why I chose a non zero lower bound was because the model did not initialize with a zero value for some reason - the updated model did not have that problem.

2. I used the same estimation procedure for the simulation I used in base R which worked fine. I realize from your reply that I misunderstood the output I received from STAN and I adjusted the generated quantities block to better represent the situation of âdrawing population effectsâ that I aimed to simulate. I think this was the cause behind why the indexing looked off to you as well - it made sense in how I interpreted the sampling.

Again, thanks for the help!

1 Like