Convergence issues for 3-level multivariate linear model

fba · March 3, 2023, 3:55pm

I am relatively new to Stan, which I am using to estimate a mixed effects multivariate model with 3 levels of nested effects. I have some questions about the estimation and model diagnosis, and would appreciate any insights from this community.

FIrst, my model; I have 6 Ys; my observations are nested within patients, which are nested within teams:

data {
  int<lower=1> K;   // number of Ys  (6)
  int<lower=1> J;   // number of Xs  (2 for testing)
  int<lower=0> N;   // number of obs
  int<lower=1> M;   // number of patients
  int<lower=1> T;   // number of teams
  
  vector[J] x[N];   // Xs
  vector[K] y[N];   // Ys
  int pp[N];        // list of patient IDs
  
  int<lower=1> teamlu[M];   // team lookup
 
}
parameters {
  matrix[K,J]  betaX;             // K ys, J xs, K x J = # betas
  vector[K]  beta0;               // beta0 - intercept 
  
  // level one errors
  cholesky_factor_corr[K] E_Omega;            
  vector<lower=0>[K] e_sigma;    
  
  // level 2 random effect
  vector[K] u_tp[M];
  vector<lower=0>[K] sigma_Utp;
  corr_matrix[K] OmegaM;           
  
  // level 3 random effect
  vector[K] u_t[T];                
  vector<lower=0>[K] sigma_Ut;     
  corr_matrix[K] OmegaT;           
}
transformed parameters {
  // varying intercepts
  // level 2
  vector[K] beta0_tp[M];
  // level 3
  vector[K] beta0_t[T];
  
  vector[K] mu[N];
  // level 3
  for (t in 1:T) {
    beta0_t[t] = beta0 + u_t[t];
  }
  
  // level 2
  for (m in 1:M) {
    beta0_tp[m] = beta0_t[teamlu[m]]+u_tp[m];
  }
 
  // mu for the model
  for (n in 1:N) {
    mu[n] = beta0_tp[pp[n]] + betaX*x[n];
  }
}

model {
 // patients
 OmegaM ~ lkj_corr(K);
 sigma_Utp ~ exponential(1);

 // Teams
 OmegaT ~ lkj_corr(K);
 sigma_Ut ~ exponential(1);

 // epsilons
 matrix[K,K] E_Sigma;
 E_Sigma = diag_pre_multiply(e_sigma,E_Omega);
 E_Omega ~ lkj_corr_cholesky(K);            
 e_sigma ~ cauchy(0,5);
 
 vector[K] nullK=rep_vector(0,K);
  
 to_vector(betaX) ~ normal(0,5);  // weak informative prior
 beta0 ~ normal(0,5);            // weak informative prior
 
 u_t ~ multi_normal(nullK,quad_form_diag(OmegaT,sigma_Ut));
 u_tp ~ multi_normal(nullK,quad_form_diag(OmegaM,sigma_Utp));
 
  
 y ~ multi_normal_cholesky( mu, E_Sigma);
}

generated quantities {
  matrix[K, K] Omega;
  matrix[K, K] Sigma;
  Omega = multiply_lower_tri_self_transpose(E_Omega);
  Sigma = quad_form_diag(Omega, e_sigma); 
}

I am estimating this model using -rstan-; my test sample has ~13000 obs, nested in ~3000 patients, nested in 15 teams. There are 32 cores so I use 600 iterations per chain, with 400 of those being warmup, leaving me with 200x32 = 6400 iterations for inference.

Three questions:

Least important question: Is there an obvious way to speed things up? This takes 8 hours to run on a 3.Ghz i9 with 32 cores and 128gb ram; which would be fine except I’m working with about 1/4 of the full dataset, and I have some additional features to add to the model. (As an aside - the processor has 24 actual cores but R’s detect.cores() finds the 32 logical processors, any advantage to using 24 instead of 32?)
Moderately important question: I get a warning that there is 1 divergent transition. I understand that divergent transitions indicate model misspecification, but is 1 enough to be concerned? Web searches turn up different answers to this question.
Most important question: I get these three warnings:

The largest R-hat is NA, indicating chains have not mixed.
Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
I understand these are largely the same issue, and all occur for E_Omega, the intermediate error matrix - here are the first few elements of the 6x6 matrix:

             mean se_mean   sd 2.5%  25%  50%  75% 97.5% n_eff Rhat
E_Omega[1,1] 1.00     NaN 0.00 1.00 1.00 1.00 1.00  1.00   NaN  NaN
E_Omega[1,2] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[1,3] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[1,4] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[1,5] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[1,6] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[2,1] 0.33    0.00 0.02 0.31 0.32 0.33 0.33  0.42    20 2.28
E_Omega[2,2] 0.94    0.00 0.01 0.91 0.94 0.94 0.95  0.95    19 2.60
E_Omega[2,3] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[2,4] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[2,5] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN
E_Omega[2,6] 0.00     NaN 0.00 0.00 0.00 0.00 0.00  0.00   NaN  NaN

Since my interest is in the betas, do I need to worry about this? If I do need to worry about this, how can I best figure out what the problem is? I have used -pairs()- and some -bayesplot- utilities to look at what is going on, but this is a 6x6 matrix, and it difficult to sort out what is driving what with so many pairwise comparisons. I did simplify the model to have only 3 Y variables, thinking it would be easier to troubleshoot, but then I got no warnings of any sort (yay?). Do I just need to increase the iterations?

Thanks in advance for any insights.

Jeph

mingqian.guo · March 4, 2023, 2:51am

How about canceling the covariance matrix in multivariate normal distribution?
Also, brms is capable of fitting three-level hierarchical model and multivariate model, surly using brms can avoid some potential coding errors.

fba · March 7, 2023, 2:41pm

Thanks for the suggestions. I started with -brms-, thinking it would at least provide a benchmark. I haven’t used it before, but I think this is the same model as mine:

model<-bf(mvbind(y1,y2,y3,y4,y5,y6) ~ x1 + x2 + (1|p|pat) + (1|q|pat:team))
fit <- brm(model,data=df,chains=24)

I used the -brms- defaults of 2000 iterations, with 1000 warmup. This produced 1095 divergent transitions, and 3 additional warnings about Rhats being too large. Is the above specification incorrect?

mingqian.guo · March 12, 2023, 3:11am

Seems fine for me. Considering you have 24000 samples in total, 1095 divergent transitions is not a very shocking number. Maybe increase the iteration number and burn-in number can solve this warning.

Topic		Replies	Views
Convergence Error (Multilevel Model) Modeling	4	469	October 9, 2020
Multi-level multivariate normal model Modeling	4	5013	April 3, 2018
Modeling Multivariate Multilevel Model in STAN Modeling	4	1429	June 2, 2020
Hierarchical Linear Mixture Model Modeling	1	631	February 4, 2019
Divergent transitions in a multivariate multilevel regression model Modeling fitting-issues	17	1411	August 25, 2018

Convergence issues for 3-level multivariate linear model

Related topics