I am relatively new to Stan, which I am using to estimate a mixed effects multivariate model with 3 levels of nested effects. I have some questions about the estimation and model diagnosis, and would appreciate any insights from this community.
FIrst, my model; I have 6 Ys; my observations are nested within patients, which are nested within teams:
data {
int<lower=1> K; // number of Ys (6)
int<lower=1> J; // number of Xs (2 for testing)
int<lower=0> N; // number of obs
int<lower=1> M; // number of patients
int<lower=1> T; // number of teams
vector[J] x[N]; // Xs
vector[K] y[N]; // Ys
int pp[N]; // list of patient IDs
int<lower=1> teamlu[M]; // team lookup
}
parameters {
matrix[K,J] betaX; // K ys, J xs, K x J = # betas
vector[K] beta0; // beta0  intercept
// level one errors
cholesky_factor_corr[K] E_Omega;
vector<lower=0>[K] e_sigma;
// level 2 random effect
vector[K] u_tp[M];
vector<lower=0>[K] sigma_Utp;
corr_matrix[K] OmegaM;
// level 3 random effect
vector[K] u_t[T];
vector<lower=0>[K] sigma_Ut;
corr_matrix[K] OmegaT;
}
transformed parameters {
// varying intercepts
// level 2
vector[K] beta0_tp[M];
// level 3
vector[K] beta0_t[T];
vector[K] mu[N];
// level 3
for (t in 1:T) {
beta0_t[t] = beta0 + u_t[t];
}
// level 2
for (m in 1:M) {
beta0_tp[m] = beta0_t[teamlu[m]]+u_tp[m];
}
// mu for the model
for (n in 1:N) {
mu[n] = beta0_tp[pp[n]] + betaX*x[n];
}
}
model {
// patients
OmegaM ~ lkj_corr(K);
sigma_Utp ~ exponential(1);
// Teams
OmegaT ~ lkj_corr(K);
sigma_Ut ~ exponential(1);
// epsilons
matrix[K,K] E_Sigma;
E_Sigma = diag_pre_multiply(e_sigma,E_Omega);
E_Omega ~ lkj_corr_cholesky(K);
e_sigma ~ cauchy(0,5);
vector[K] nullK=rep_vector(0,K);
to_vector(betaX) ~ normal(0,5); // weak informative prior
beta0 ~ normal(0,5); // weak informative prior
u_t ~ multi_normal(nullK,quad_form_diag(OmegaT,sigma_Ut));
u_tp ~ multi_normal(nullK,quad_form_diag(OmegaM,sigma_Utp));
y ~ multi_normal_cholesky( mu, E_Sigma);
}
generated quantities {
matrix[K, K] Omega;
matrix[K, K] Sigma;
Omega = multiply_lower_tri_self_transpose(E_Omega);
Sigma = quad_form_diag(Omega, e_sigma);
}
I am estimating this model using rstan; my test sample has ~13000 obs, nested in ~3000 patients, nested in 15 teams. There are 32 cores so I use 600 iterations per chain, with 400 of those being warmup, leaving me with 200x32 = 6400 iterations for inference.
Three questions:

Least important question: Is there an obvious way to speed things up? This takes 8 hours to run on a 3.Ghz i9 with 32 cores and 128gb ram; which would be fine except I’m working with about 1/4 of the full dataset, and I have some additional features to add to the model. (As an aside  the processor has 24 actual cores but R’s detect.cores() finds the 32 logical processors, any advantage to using 24 instead of 32?)

Moderately important question: I get a warning that there is 1 divergent transition. I understand that divergent transitions indicate model misspecification, but is 1 enough to be concerned? Web searches turn up different answers to this question.

Most important question: I get these three warnings:
 The largest Rhat is NA, indicating chains have not mixed.
 Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
 Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
I understand these are largely the same issue, and all occur for E_Omega, the intermediate error matrix  here are the first few elements of the 6x6 matrix:
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
E_Omega[1,1] 1.00 NaN 0.00 1.00 1.00 1.00 1.00 1.00 NaN NaN
E_Omega[1,2] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[1,3] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[1,4] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[1,5] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[1,6] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[2,1] 0.33 0.00 0.02 0.31 0.32 0.33 0.33 0.42 20 2.28
E_Omega[2,2] 0.94 0.00 0.01 0.91 0.94 0.94 0.95 0.95 19 2.60
E_Omega[2,3] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[2,4] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[2,5] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
E_Omega[2,6] 0.00 NaN 0.00 0.00 0.00 0.00 0.00 0.00 NaN NaN
Since my interest is in the betas, do I need to worry about this? If I do need to worry about this, how can I best figure out what the problem is? I have used pairs() and some bayesplot utilities to look at what is going on, but this is a 6x6 matrix, and it difficult to sort out what is driving what with so many pairwise comparisons. I did simplify the model to have only 3 Y variables, thinking it would be easier to troubleshoot, but then I got no warnings of any sort (yay?). Do I just need to increase the iterations?
Thanks in advance for any insights.
Jeph