A coauthor and I are thinking about an 8schools like case in which there are multiple experiments and you only have the mean and the standard deviation of the effect. The question is how to adjust for correlation between the schools whereby you have prior information on the correlation parameters but are not able to fold that information into a multilevel structure. Would the implementation below be appropriate?

data {
int<lower=0> J; // number of schools
vector[J] y; // estimated treatment effects
vector<lower=0>[J] sigma; // standard error of effect estimates
}
parameters {
real mu; // population treatment effect
real<lower=0> tau; // standard deviation in treatment effects
vector[J] eta; // unscaled deviation from mu by school
cholesky_factor_corr[J] L_Omega;
}
transformed parameters {
matrix[J, J] L_Sigma;
vector[J] theta = mu + tau * eta; // school treatment effects
L_Sigma = diag_pre_multiply(sigma, L_Omega);
}
model {
L_Omega ~ lkj_corr_cholesky(4);
target += normal_lpdf(eta | 0, 1); // prior log-density
target += multi_normal_cholesky_lpdf(y | theta, L_Sigma); // log-likelihood
}

You want to integrate eta out of that posterior distribution analytically and perhaps draw from its full-conditional distribution in the generated quantities block. One of the things that I don’t like about the 8 schools example is that we get caught up on talking about centered vs. non-centered parameterizations and in a lot of cases with meta-analysis or something similar, neither parameterization works that well in Stan. The right parameterization is to not have eta.

data {
int<lower=0> J; // number of schools
vector[J] y; // estimated treatment effects
vector<lower=0>[J] sigma; // standard error of effect estimates
}
transformed data {
vector[J] sigma2 = square(sigma);
}
parameters {
real mu; // population treatment effect
real<lower=0> tau; // standard deviation in treatment effects
cholesky_factor_corr[J] L_Omega;
}
model {
matrix[J,J] Omega = multiply_lower_tri_self_transpose(L_Omega);
real tau2 = square(tau);
for (j in 1:J) Omega[j,j] = 1 + tau2 + sigma2[j];
target += multi_normal_lpdf(y | rep_vector(mu, J), Omega);
target += lkj_corr_cholesky(L_Omega | 4); // 4 is usually too big
// prior on mu and tau
}

So I tried the model above with the DGP below but it appears that the model pools the estimates all the way to the mean. Is there an error or am I generally approaching this in the wrong way?

# libraries
library(rstan); library(MASS)
# sampling
chains <- 4
iter <- 2000
# general
T <- 2
N <- rpois(T, 1000)
## effects live here (you just can't see them)
# hyperparameter
center_mu <- 1.5; deviation_mu <- 0.5
center_sigma <- 0; deviation_sigma <- 2
mu_mu <- rnorm(1, center_mu, deviation_mu)
mu_sigma <- abs(rnorm(1, center_sigma, deviation_sigma))
# effect in test
mu <- rnorm(T, mu_mu, mu_sigma)
# covariance matrix
p <- qr.Q(qr(matrix(rnorm(T^2, 0.5), T)))
Sigma <- crossprod(p, p*(T:1))
# observed effects
effect.dist.mean <- rep(NA, T)
effect.dist.sd <- rep(NA, T)
beta <- MASS::mvrnorm(max(N), mu, Sigma) ## draw effect distribution
for (i in 1:T){
x <- rbinom(N[i], 1, 0.5)
alpha_mu <- rnorm(2, 2, 0.5)
sigma_y <- abs(rnorm(1, 0, 1))
alpha <- rnorm(N[i], alpha_mu, 0.2)
y <- rnorm(N[i], alpha + beta[1:N[i], i] * x, sigma_y)
summary <- summary(lm(y ~ 1 + x))
effect.dist.mean[i] <- summary$coefficients[2, 1]
effect.dist.sd[i] <- summary$coefficients[2, 2]
}
data <- list(
J = T,
y = effect.dist.mean,
sigma = effect.dist.sd
)
setwd("~/Dropbox/QuantMethods/S I M O N E")
m.mvn_wpriors <- rstan::stan_model("codeStan_wMVN_v4.stan")
fit <- rstan::sampling(m.mvn_wpriors, data = data)
fit_extract <- rstan::extract(fit)
print("effect sizes")
colMeans(fit_extract$theta_rep)
mu
print("hyper mu")
mean(fit_extract$mu)
mu_mu

I feel a bit stuck (…) since I also tried inputting the covariance matrix as known but even then it pooled all the way to the mean. Even adjusting tau didn’t change that result.