Warning messages with `lkj_corr_cholesky` priors

I have a simple problem whereby I want to put a prior on the covariance matrix of multivariate normal, for that I will use a LKJ prior on the corresponding correlation matrix, and pre and post multiply by a vector of standard deviations, as discussed here and elsewhere. Specifically I will use the lkj_corr_cholesky prior the Cholesky factor of the correlation matrix.

Whenever I run this, even in simple toy problems, I get the same warnings: The largest R-hat is NA ..., and Bulk Effective Samples Size (ESS) is too low ..., and Tail Effective Samples Size (ESS) is too low ....

Here is an example of a simple model (adopted from code in this blog).

data {
  int<lower=1> N; 
  int<lower=1> J; 
  vector[J] y[N]; 
  vector[J] Zero;
}

parameters {
  cholesky_factor_corr[J] L_omega;  
  vector<lower=0>[J] sigma; 
}

model {
  y ~ multi_normal_cholesky(Zero, diag_pre_multiply(sigma, L_omega));
  sigma ~ cauchy(0, 5);
  L_omega ~ lkj_corr_cholesky(1.0);
}

Assuming that the above code is stored in a string in R named stancode, I can then run it with the following R code.

library(tidyverse)
library(rstan)

set.seed(10101)

J <- 3
N <- 100

Sigma <- rbind(c(1.58, 0.30, -0.40),
               c(0.30,  0.67, -1.04),
               c(-0.40, -1.04,  1.86)
)

Zero <- rep(0, J)

y <- mvtnorm::rmvnorm(N, Zero, Sigma)

M <- stan(model_code = stancode, 
          data = list(J = J, N = N, y = y, Zero = Zero))

The warnings I get here, which are typical of whenever I use the lkj_corr_cholesky are as follows:

Warning messages:
1: The largest R-hat is NA, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
2: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
3: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 

Looking at the summary information for NAs, we can see the following, again which is typical.

summary(M)$summary %>%
  as_tibble(rownames = 'par') %>%
  filter_all(any_vars(is.na(.)))
# A tibble: 4 x 11
  par           mean se_mean    sd `2.5%` `25%` `50%` `75%` `97.5%` n_eff  Rhat
  <chr>        <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>
1 L_omega[1,1]     1     NaN     0      1     1     1     1       1   NaN   NaN
2 L_omega[1,2]     0     NaN     0      0     0     0     0       0   NaN   NaN
3 L_omega[1,3]     0     NaN     0      0     0     0     0       0   NaN   NaN
4 L_omega[2,3]     0     NaN     0      0     0     0     0       0   NaN   NaN

I presume that the warnings above can and should be ignored and are simply as a consequence of the fact that the Cholesky factor is a lower triangle matrix whose row vectors have unit length, and so the upper triangle is fixed at zero and the first element of the first row is fixed at 1.0. In other words, these elements have fixed values at 0 or 1 hence a sd of 0 etc.

However, the lkj_corr_cholesky prior is used routinely in Stan based multilevel models, such as those created by brms, and I never see these warnings.

More generally, assuming that these warnings are to be expected and are inconsequential, it feels like they are necessarily false alarms and so it seems like it would be better if they could be suppressed, though I don’t know of any way to do that using e.g. rstan::stan.

Hey!

Yes!

Maybe just something like

suppressWarnings(
  M <- stan(model_code = stancode, 
            data = list(J = J, N = N, y = y, Zero = Zero))
  )

will do?

1 Like

Thanks. I presume suppressWarnings would suppress all the other warnings too, i.e. those that I don’t want to ignore. On the other hand, as I said, brms::brm does not raise these warnings due to LKJ Cholesky priors, and does raise other warnings, so there must be some trick here. Just to clarify, if I do a brms::brm model that involves lkj_corr_cholesky as per normal, I get no warnings, but if extract the Stan code from brms and run that, then I get errors above.

For example

library(rstan)
library(brms)

standata <- make_standata(Reaction ~ Days + (Days|Subject), data = lme4::sleepstudy)
stancode <- make_stancode(Reaction ~ Days + (Days|Subject), data = lme4::sleepstudy)

# warnings like above occur here
M <- stan(model_code = stancode, data = standata)

# no warnings here, but same Stan model exactly
Mb <- brm(Reaction ~ Days + (Days|Subject), data = lme4::sleepstudy)

I would also look at rstan::monitor(M) to make sure the bulk and tail ESS for all of the things that are actually random variables are not too low.

1 Like

Thanks. ESS is fine.
Also, what’s interesting about rstan::monitor unlike rstan::summary is that I get no NaNs.

(From my toy model in original post).

Inference for the input samples (4 chains: each with iter = 2000; warmup = 0):

                 Q5    Q50    Q95   Mean  SD  Rhat Bulk_ESS Tail_ESS
L_omega[1,1]    1.0    1.0    1.0    1.0 0.0     1     4000     4000
L_omega[2,1]    0.1    0.2    0.4    0.2 0.1     1     2064     2150
L_omega[3,1]   -0.3   -0.2    0.0   -0.2 0.1     1     2011     2016
L_omega[1,2]    0.0    0.0    0.0    0.0 0.0     1     4000     4000
L_omega[2,2]    0.9    1.0    1.0    1.0 0.0     1     2051     2150
L_omega[3,2]   -0.9   -0.9   -0.9   -0.9 0.0     1     2163     2453
L_omega[1,3]    0.0    0.0    0.0    0.0 0.0     1     4000     4000
L_omega[2,3]    0.0    0.0    0.0    0.0 0.0     1     4000     4000
L_omega[3,3]    0.3    0.4    0.4    0.4 0.0     1     1986     2283
sigma[1]        1.2    1.4    1.5    1.4 0.1     1     3069     2811
sigma[2]        0.8    0.9    1.0    0.9 0.1     1     1751     1763
sigma[3]        1.3    1.5    1.7    1.5 0.1     1     1801     1916
lp__         -109.4 -105.8 -103.9 -106.1 1.7     1     1544     2439

The values are just fixed to match the number of post warmup iterations. But clearly, these values are fixed, so any inference on them is kinda pointless.

1 Like