Hi, I am running a 3-level hierarchical model for a meta-analysis on data from several RCTs (the whole dataset is roughly 10 million observations). Since the dataset is incredibly large, I really need to optimize the model as best as I can, because even if I run it on a server, it takes a prohibitive amount of time. Therefore, I wanted to ask you if you see anything that might be worth changing to optimize the code. I will first post a sketch of the theoretical model and then my code.
\begin{align} & Y_{j,k} \mid \alpha_j, \theta_j, \Omega_j, \alpha, \theta, \Sigma, \Pi \sim \mathcal{MN}(\alpha_j + \theta_j T_j , \Omega_j) \\[10pt] & \begin{pmatrix} \alpha_1 \\ ... \\ \alpha_J \\ \theta_1 \\ ... \\ \theta_J \end{pmatrix} \mid \alpha, \theta, \Sigma, \Pi \sim \mathcal{MN} \left(\begin{pmatrix} \alpha \\ \theta \end{pmatrix}, \Sigma \right) \\[10pt] & \begin{pmatrix} \alpha \\ \theta \end{pmatrix} \mid \Pi \sim \mathcal{MN} \left(\begin{pmatrix} 0 \\ 0 \end{pmatrix}, \Pi \right) \\ \end{align}
As you can see by the code, I made some simplifying assumptions such as modelling the likelihood as only heteroskedastic (with respect to treatment or control group) and intercepts and slopes are modelled independently to reduce the number of parameters. On the other hand, there is a covariance structure within slope parameters or intercepts. Moreover, I used Cholesky decomposition for var-covar matrices for slopes and intercepts and uncentered them with respect to the first hierarchical layers (e.g. \alpha and \theta) as I suspect that there is little pooling going on in the site-specific Average Treatment Effects.
data {
int<lower=0> N; // num observations
int<lower=1> J; // num RCTs
// int<lower=0,upper=N> n[J]; // num observations in j-th RCT
int<lower=1,upper=J> g[N]; // group for individual
vector[N] y; // outcome
vector[N] w; // treatment assigned
}
// The parameters accepted by the model. Our model
// accepts two parameters 'mu' and 'sigma'.
parameters {
real a; // common mean for the alpha's
real t; // common mean for the theta's
vector[J] z; // vector to construct uncentered parameters
cholesky_factor_corr[J] L_Omega_theta;
vector<lower=0,upper=pi()/2>[J] tau_unif_theta;
cholesky_factor_corr[J] L_Omega_alpha;
vector<lower=0,upper=pi()/2>[J] tau_unif_alpha;
real<lower=0,upper=pi()/2> sigma_t_unif; // residual SD for the treated
real<lower=0,upper=pi()/2> sigma_c_unif; // residual SD for the control
}
transformed parameters{
// Here we use the fact that if X is uniformly distributed on an appropriate
// support (0, pi/2), then Y=tan(X) is distributed as a half-Cauchy random
// variable. We perform this transformation because is computationally more
// efficient for Stan to sample from a uniform than from a Cauchy with thick
// tails.
real<lower=0> sigma_t = tan(sigma_t_unif);
real<lower=0> sigma_c = tan(sigma_c_unif);
//
vector<lower=0>[J] tau_theta = tan(tau_unif_theta);
vector<lower=0>[J] tau_alpha = tan(tau_unif_alpha);
vector[J] alpha; // matrix of RCT-specific intercepts
vector[J] theta; // vector of RCT-specific ATE's
vector[J] one_J = rep_vector(1, J);
theta = one_J*t + diag_pre_multiply(tau_theta,L_Omega_theta) * z;
alpha = one_J*a + diag_pre_multiply(tau_alpha,L_Omega_alpha) * z;
// notice that the likelihood has RCT-specific control mean (alpha) and ATE
// (theta)
vector[N] m;
for (i in 1:N) {
m[i] = alpha[g[i]] + theta[g[i]]*w[i];
}
// Here we construct the heteroskedastic variance of the likelihood, where we
// assume treatment and control group have different variance parameters
vector<lower=0>[N] sigma;
vector[N] one_N = rep_vector(1, N);
sigma = sigma_t*w + sigma_c*(one_N-w);
}
model {
a ~ normal(0,1);
t ~ normal(0,1);
L_Omega_alpha ~ lkj_corr_cholesky(1);
L_Omega_theta ~ lkj_corr_cholesky(1);
tau_unif_alpha ~ uniform(0,pi()/2);
tau_unif_theta ~ uniform(0,pi()/2);
z ~ normal(0,1);
sigma_c_unif ~ uniform(0,pi()/2);
sigma_t_unif ~ uniform(0,pi()/2);
y ~ normal(m, sigma);
}
I don’t have a CS background so I bet my code could be improved in many ways, my main sources for this model were the Stan manual this awesome resource by @betanalpha (Hierarchical Modeling). Thanks in advance for the help, this community is awesome.