# Convergence issues for 3-level multivariate linear model

I am relatively new to Stan, which I am using to estimate a mixed effects multivariate model with 3 levels of nested effects. I have some questions about the estimation and model diagnosis, and would appreciate any insights from this community.

FIrst, my model; I have 6 Ys; my observations are nested within patients, which are nested within teams:

``````data {
int<lower=1> K;   // number of Ys  (6)
int<lower=1> J;   // number of Xs  (2 for testing)
int<lower=0> N;   // number of obs
int<lower=1> M;   // number of patients
int<lower=1> T;   // number of teams

vector[J] x[N];   // Xs
vector[K] y[N];   // Ys
int pp[N];        // list of patient IDs

int<lower=1> teamlu[M];   // team lookup

}
parameters {
matrix[K,J]  betaX;             // K ys, J xs, K x J = # betas
vector[K]  beta0;               // beta0 - intercept

// level one errors
cholesky_factor_corr[K] E_Omega;
vector<lower=0>[K] e_sigma;

// level 2 random effect
vector[K] u_tp[M];
vector<lower=0>[K] sigma_Utp;
corr_matrix[K] OmegaM;

// level 3 random effect
vector[K] u_t[T];
vector<lower=0>[K] sigma_Ut;
corr_matrix[K] OmegaT;
}
transformed parameters {
// varying intercepts
// level 2
vector[K] beta0_tp[M];
// level 3
vector[K] beta0_t[T];

vector[K] mu[N];
// level 3
for (t in 1:T) {
beta0_t[t] = beta0 + u_t[t];
}

// level 2
for (m in 1:M) {
beta0_tp[m] = beta0_t[teamlu[m]]+u_tp[m];
}

// mu for the model
for (n in 1:N) {
mu[n] = beta0_tp[pp[n]] + betaX*x[n];
}
}

model {
// patients
OmegaM ~ lkj_corr(K);
sigma_Utp ~ exponential(1);

// Teams
OmegaT ~ lkj_corr(K);
sigma_Ut ~ exponential(1);

// epsilons
matrix[K,K] E_Sigma;
E_Sigma = diag_pre_multiply(e_sigma,E_Omega);
E_Omega ~ lkj_corr_cholesky(K);
e_sigma ~ cauchy(0,5);

vector[K] nullK=rep_vector(0,K);

to_vector(betaX) ~ normal(0,5);  // weak informative prior
beta0 ~ normal(0,5);            // weak informative prior

y ~ multi_normal_cholesky( mu, E_Sigma);
}

generated quantities {
matrix[K, K] Omega;
matrix[K, K] Sigma;
Omega = multiply_lower_tri_self_transpose(E_Omega);
}
``````

I am estimating this model using -rstan-; my test sample has ~13000 obs, nested in ~3000 patients, nested in 15 teams. There are 32 cores so I use 600 iterations per chain, with 400 of those being warmup, leaving me with 200x32 = 6400 iterations for inference.

Three questions:

1. Least important question: Is there an obvious way to speed things up? This takes 8 hours to run on a 3.Ghz i9 with 32 cores and 128gb ram; which would be fine except I’m working with about 1/4 of the full dataset, and I have some additional features to add to the model. (As an aside - the processor has 24 actual cores but R’s detect.cores() finds the 32 logical processors, any advantage to using 24 instead of 32?)

2. Moderately important question: I get a warning that there is 1 divergent transition. I understand that divergent transitions indicate model misspecification, but is 1 enough to be concerned? Web searches turn up different answers to this question.

3. Most important question: I get these three warnings:

• The largest R-hat is NA, indicating chains have not mixed.
• Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
• Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
I understand these are largely the same issue, and all occur for E_Omega, the intermediate error matrix - here are the first few elements of the 6x6 matrix:
``````             mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
E_Omega[1,1] 1.00     NaN 0.00 1.00 1.00 1.00 1.00  1.00   NaN  NaN
E_Omega[1,2] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[1,3] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[1,4] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[1,5] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[1,6] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[2,1] 0.33    0.00 0.02 0.31 0.32 0.33 0.33  0.42    20 2.28
E_Omega[2,2] 0.94    0.00 0.01 0.91 0.94 0.94 0.95  0.95    19 2.60
E_Omega[2,3] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[2,4] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[2,5] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[2,6] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
``````

Since my interest is in the betas, do I need to worry about this? If I do need to worry about this, how can I best figure out what the problem is? I have used -pairs()- and some -bayesplot- utilities to look at what is going on, but this is a 6x6 matrix, and it difficult to sort out what is driving what with so many pairwise comparisons. I did simplify the model to have only 3 Y variables, thinking it would be easier to troubleshoot, but then I got no warnings of any sort (yay?). Do I just need to increase the iterations?

Thanks in advance for any insights.

Jeph

How about canceling the covariance matrix in multivariate normal distribution?
Also, brms is capable of fitting three-level hierarchical model and multivariate model, surly using brms can avoid some potential coding errors.

Thanks for the suggestions. I started with -brms-, thinking it would at least provide a benchmark. I haven’t used it before, but I think this is the same model as mine:

``````model<-bf(mvbind(y1,y2,y3,y4,y5,y6) ~ x1 + x2 + (1|p|pat) + (1|q|pat:team))
fit <- brm(model,data=df,chains=24)
``````

I used the -brms- defaults of 2000 iterations, with 1000 warmup. This produced 1095 divergent transitions, and 3 additional warnings about Rhats being too large. Is the above specification incorrect?

Seems fine for me. Considering you have 24000 samples in total, 1095 divergent transitions is not a very shocking number. Maybe increase the iteration number and burn-in number can solve this warning.