I am relatively new to Stan, which I am using to estimate a mixed effects multivariate model with 3 levels of nested effects. I have some questions about the estimation and model diagnosis, and would appreciate any insights from this community.
FIrst, my model; I have 6 Ys; my observations are nested within patients, which are nested within teams:
data {
int<lower=1> K; // number of Ys (6)
int<lower=1> J; // number of Xs (2 for testing)
int<lower=0> N; // number of obs
int<lower=1> M; // number of patients
int<lower=1> T; // number of teams
vector[J] x[N]; // Xs
vector[K] y[N]; // Ys
int pp[N]; // list of patient IDs
int<lower=1> teamlu[M]; // team lookup
}
parameters {
matrix[K,J] betaX; // K ys, J xs, K x J = # betas
vector[K] beta0; // beta0 - intercept
// level one errors
cholesky_factor_corr[K] E_Omega;
vector<lower=0>[K] e_sigma;
// level 2 random effect
vector[K] u_tp[M];
vector<lower=0>[K] sigma_Utp;
corr_matrix[K] OmegaM;
// level 3 random effect
vector[K] u_t[T];
vector<lower=0>[K] sigma_Ut;
corr_matrix[K] OmegaT;
}
transformed parameters {
// varying intercepts
// level 2
vector[K] beta0_tp[M];
// level 3
vector[K] beta0_t[T];
vector[K] mu[N];
// level 3
for (t in 1:T) {
beta0_t[t] = beta0 + u_t[t];
}
// level 2
for (m in 1:M) {
beta0_tp[m] = beta0_t[teamlu[m]]+u_tp[m];
}
// mu for the model
for (n in 1:N) {
mu[n] = beta0_tp[pp[n]] + betaX*x[n];
}
}
model {
// patients
OmegaM ~ lkj_corr(K);
sigma_Utp ~ exponential(1);
// Teams
OmegaT ~ lkj_corr(K);
sigma_Ut ~ exponential(1);
// epsilons
matrix[K,K] E_Sigma;
E_Sigma = diag_pre_multiply(e_sigma,E_Omega);
E_Omega ~ lkj_corr_cholesky(K);
e_sigma ~ cauchy(0,5);
vector[K] nullK=rep_vector(0,K);
to_vector(betaX) ~ normal(0,5); // weak informative prior
beta0 ~ normal(0,5); // weak informative prior
u_t ~ multi_normal(nullK,quad_form_diag(OmegaT,sigma_Ut));
u_tp ~ multi_normal(nullK,quad_form_diag(OmegaM,sigma_Utp));
y ~ multi_normal_cholesky( mu, E_Sigma);
}
generated quantities {
matrix[K, K] Omega;
matrix[K, K] Sigma;
Omega = multiply_lower_tri_self_transpose(E_Omega);
Sigma = quad_form_diag(Omega, e_sigma);
}
I am estimating this model using -rstan-; my test sample has ~13000 obs, nested in ~3000 patients, nested in 15 teams. There are 32 cores so I use 600 iterations per chain, with 400 of those being warmup, leaving me with 200x32 = 6400 iterations for inference.
Three questions:
-
Least important question: Is there an obvious way to speed things up? This takes 8 hours to run on a 3.Ghz i9 with 32 cores and 128gb ram; which would be fine except I’m working with about 1/4 of the full dataset, and I have some additional features to add to the model. (As an aside - the processor has 24 actual cores but R’s detect.cores() finds the 32 logical processors, any advantage to using 24 instead of 32?)
-
Moderately important question: I get a warning that there is 1 divergent transition. I understand that divergent transitions indicate model misspecification, but is 1 enough to be concerned? Web searches turn up different answers to this question.
-
Most important question: I get these three warnings:
- The largest R-hat is NA, indicating chains have not mixed.
- Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
- Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
I understand these are largely the same issue, and all occur for E_Omega, the intermediate error matrix - here are the first few elements of the 6x6 matrix:
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
E_Omega[1,1] 1.00 NaN 0.00 1.00 1.00 1.00 1.00 1.00 NaN NaN
E_Omega[1,2] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[1,3] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[1,4] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[1,5] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[1,6] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[2,1] 0.33 0.00 0.02 0.31 0.32 0.33 0.33 0.42 20 2.28
E_Omega[2,2] 0.94 0.00 0.01 0.91 0.94 0.94 0.95 0.95 19 2.60
E_Omega[2,3] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[2,4] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[2,5] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[2,6] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
Since my interest is in the betas, do I need to worry about this? If I do need to worry about this, how can I best figure out what the problem is? I have used -pairs()- and some -bayesplot- utilities to look at what is going on, but this is a 6x6 matrix, and it difficult to sort out what is driving what with so many pairwise comparisons. I did simplify the model to have only 3 Y variables, thinking it would be easier to troubleshoot, but then I got no warnings of any sort (yay?). Do I just need to increase the iterations?
Thanks in advance for any insights.
Jeph