8schools but they are correlated

Merlin_Heidemanns · December 10, 2019, 6:57pm

A coauthor and I are thinking about an 8schools like case in which there are multiple experiments and you only have the mean and the standard deviation of the effect. The question is how to adjust for correlation between the schools whereby you have prior information on the correlation parameters but are not able to fold that information into a multilevel structure. Would the implementation below be appropriate?

data {
  int<lower=0> J;         // number of schools
  vector[J] y;              // estimated treatment effects
  vector<lower=0>[J] sigma; // standard error of effect estimates
}
parameters {
  real mu;                // population treatment effect
  real<lower=0> tau;      // standard deviation in treatment effects
  vector[J] eta;          // unscaled deviation from mu by school
  cholesky_factor_corr[J] L_Omega;
}
transformed parameters {
  matrix[J, J] L_Sigma;
  vector[J] theta = mu + tau * eta;        // school treatment effects
  L_Sigma = diag_pre_multiply(sigma, L_Omega);
}
model {
  L_Omega ~ lkj_corr_cholesky(4);
  target += normal_lpdf(eta | 0, 1);       // prior log-density
  target += multi_normal_cholesky_lpdf(y | theta, L_Sigma); // log-likelihood
}

bgoodri · December 10, 2019, 7:14pm

You want to integrate eta out of that posterior distribution analytically and perhaps draw from its full-conditional distribution in the generated quantities block. One of the things that I don’t like about the 8 schools example is that we get caught up on talking about centered vs. non-centered parameterizations and in a lot of cases with meta-analysis or something similar, neither parameterization works that well in Stan. The right parameterization is to not have eta.

data {
  int<lower=0> J;         // number of schools
  vector[J] y;              // estimated treatment effects
  vector<lower=0>[J] sigma; // standard error of effect estimates  
}
transformed data {
  vector[J] sigma2 = square(sigma);
}
parameters {
  real mu;                // population treatment effect
  real<lower=0> tau;      // standard deviation in treatment effects
  cholesky_factor_corr[J] L_Omega;
}
model {
  matrix[J,J]  Omega = multiply_lower_tri_self_transpose(L_Omega);
  real tau2 = square(tau);
  for (j in 1:J) Omega[j,j] = 1 + tau2 + sigma2[j];
  target += multi_normal_lpdf(y | rep_vector(mu, J), Omega);
  target += lkj_corr_cholesky(L_Omega | 4); // 4 is usually too big
  // prior on mu and tau
}

bbbales2 · December 10, 2019, 7:30pm

This model indeed does not have eta, but I do not believe it is the correct model :P.

Edit: Courtesy your edits my joke is obsolete

bgoodri · December 10, 2019, 7:38pm

Still funny though

Merlin_Heidemanns · December 10, 2019, 7:54pm

Thanks! I will try that. The help I received so far from other sources was not as helpful…

bgoodri · December 10, 2019, 8:19pm

Yeah, I don’t think BDA covers marginalizing eta out.

bbbales2 · December 10, 2019, 11:31pm

Raccoon cat. Send that photo to the Gelblogg.

Merlin_Heidemanns · December 11, 2019, 10:38pm

So I tried the model above with the DGP below but it appears that the model pools the estimates all the way to the mean. Is there an error or am I generally approaching this in the wrong way?

# libraries

library(rstan); library(MASS)

# sampling
chains <- 4
iter <- 2000

# general
T <- 2
N <- rpois(T, 1000)

## effects live here (you just can't see them)

# hyperparameter
center_mu <- 1.5; deviation_mu <- 0.5
center_sigma <- 0; deviation_sigma <- 2
mu_mu <- rnorm(1, center_mu, deviation_mu)
mu_sigma <- abs(rnorm(1, center_sigma, deviation_sigma))

# effect in test
mu <- rnorm(T, mu_mu, mu_sigma)

# covariance matrix
p <- qr.Q(qr(matrix(rnorm(T^2, 0.5), T)))
Sigma <- crossprod(p, p*(T:1))

# observed effects
effect.dist.mean <- rep(NA, T)
effect.dist.sd <- rep(NA, T)
beta <- MASS::mvrnorm(max(N), mu, Sigma)  ## draw effect distribution

for (i in 1:T){
  x <- rbinom(N[i], 1, 0.5)
  alpha_mu <- rnorm(2, 2, 0.5)
  sigma_y <- abs(rnorm(1, 0, 1))
  alpha <- rnorm(N[i], alpha_mu, 0.2)
  y <- rnorm(N[i], alpha + beta[1:N[i], i] * x, sigma_y)
  summary <- summary(lm(y ~ 1 + x))
  effect.dist.mean[i] <- summary$coefficients[2, 1]
  effect.dist.sd[i] <- summary$coefficients[2, 2]
}


data <- list(
  J = T,
  y = effect.dist.mean,
  sigma = effect.dist.sd
)


setwd("~/Dropbox/QuantMethods/S I M O N E")
m.mvn_wpriors <- rstan::stan_model("codeStan_wMVN_v4.stan")
fit <- rstan::sampling(m.mvn_wpriors, data = data)
fit_extract <- rstan::extract(fit)

print("effect sizes")
colMeans(fit_extract$theta_rep)
mu
print("hyper mu")
mean(fit_extract$mu)
mu_mu

I feel a bit stuck (…) since I also tried inputting the covariance matrix as known but even then it pooled all the way to the mean. Even adjusting tau didn’t change that result.

Topic		Replies	Views
Simple question on hierarchical non-centered parameterization Modeling	3	2027	October 16, 2017
Eight Schools example: Questions General specification	1	335	March 25, 2024
Meta-analysis with inference on measurement error variability Modeling specification , meta-analysis	7	1153	January 13, 2020
Sigma / correlation matrix and non-centered parameterization of random effects WITHOUT Cholesky factorization Modeling	4	693	January 31, 2022
Convergence Error (Multilevel Model) Modeling	4	471	October 9, 2020

8schools but they are correlated

Related topics