I have a set of treatment estimates from randomized trials that I’m trying to better model. All of the interventions, each belonging to one of several broad intervention categories, attempted to improve student attendance, with vary degrees of success. Each of the treatment estimates comes from a Poisson regression run in R, with the resulting (untransformed) estimates and covariance passed in as data to Stan. What I’m also trying to do, and what I suspect I might have done poorly, is to expand my Stan model so that it also takes in the mean absence rate from the control group in each trial (along with another covariance matrix for this). As various people have pointed out, if the control mean in an RCT is correlated with the treatment effect, and you build this into your model, maybe you can get better inference on the estimated true treatment effects and estimated true heterogeneity in effects.
One complicating factor is that many of the trials had more than one treatment arm and a single control arm. In which case there is of course only one control mean absence rate for that trial’s set of treatment estimates. So I try to have one lambda – what I call the true mean absence rate parameter – for each trial be correlated with a set of thetas – what I call the true treatment effect parameters for the treatment estimates. I also (believe) I set up the model to estimate just a single correlation between each lambda and set of thetas across all trials, along with a single mean and variance for all the lambdas. Whereas I allow the mean and variation in the thetas to depend on the intervention type.
This model compiles and samples – but the sampling is poor. I run 4 chains with 8000 iterations. There were 22 divergent transitions, which doesn’t seem too terrible. But I also get warnings about ~16000 transitions exceeding max treedepth, 1 chain with low BFMI, that the largest R-hat was NA, and that ESS was too low. This seems like just about every warning that Stan throws.
I’m wondering if I wrote this model poorly, especially the arrays YY, MU, and D, as referenced in the multi_normal() function where I iterate through the contents of YY, MU, and D.
If there isn’t anything glaringly wrong with the model, does anyone have suggestions for how to make the sampling proceed more smoothly?
Thank you!
data {
int<lower=1> J; // distinct intervention types
int<lower=1> T; // distinct trials, some multi-armed; each trial falls into some type j
int<lower=1> N; // number of total arms across all trials and intervention types
vector[N] y; // estimated treatment effects
cov_matrix[N] sigma_y; // covariance matrix of estimated treatment effects
vector[T] a; // estimated control mean absence rates
cov_matrix[T] sigma_a; // covariance matrix of mean absence rates
int<lower=1,upper=J> idx_study[N]; // intervention type index for that arm
int<lower=1,upper=T> idx_trial[N]; // trial index for that arm
real<lower=0> alpha_sd; // SD for prior on alpha
real<lower=0> sigma_delta_sd; // SD for prior on intervention effects
real<lower=0> sigma_epsilon_sd_sd; // SD for prior for heterogeneity in arm effects variances
}
parameters {
real alpha; // average affect across all treatment arms, trials, intervention types
real<lower=0> sigma_epsilon_sd; // SD of half normal distribution of sigma_epsilons
real<lower=0> sigma_delta; // SD of intervention type effect deviances
vector[J] delta_nc; // non-centered deltas
vector<lower=0>[J] sigma_epsilon_nc; // non-centered sigma epsilons
vector<lower=0>[T] lambda; // true parameters for estimated control mean abs rates
real<lower=0> mu; // mean parameter for the lambdas
real<lower=0> sigma_lambda; // SD of lambdas
corr_matrix[2] Rho; // matrix for correlation between thetas, lambdas
vector[N] theta; // true parameter generating observed treatment effects
}
transformed parameters {
vector[J] delta; // intervention type effects
vector<lower=0>[J] sigma_epsilon; // SD of arm effects, estimated for each intervention type
sigma_epsilon[idx_study] = sigma_epsilon_nc[idx_study]*sigma_epsilon_sd;
delta[idx_study] = delta_nc[idx_study]*sigma_delta;
}
model {
array[N] vector[2] YY;
array[N] vector[2] MU;
array[N] vector[2] D;
for (i in 1:N) YY[i] = [theta[i],lambda[idx_trial[i]]]';
for (i in 1:N) MU[i] = [alpha + delta[idx_study[i]],mu]';
for (i in 1:N) D[i] = [sigma_epsilon[idx_study[i]],sigma_lambda]';
alpha ~ normal(0,alpha_sd);
sigma_epsilon_nc ~ normal(0,1);
sigma_epsilon_sd ~ normal(0,sigma_epsilon_sd_sd); // sigma_epsilon_sd_sd is admittedly a bad name
delta_nc ~ normal(0,1);
sigma_delta ~ normal(0,sigma_delta_sd);
sigma_lambda ~ normal(0,0.05);
mu ~ normal(0.07, 0.03) T[0,];
Rho ~ lkj_corr(2);
for (i in 1:N) YY[i] ~ multi_normal(MU[i], quad_form_diag(Rho, D[i]));
a ~ multi_normal(lambda, sigma_a); // likelihood function for the observed control mean abs. rates
y ~ multi_normal(theta, sigma_y); // likelihood function for the observed treatment estimates
}