Warning messages with `lkj_corr_cholesky` priors

mark-andrews · March 24, 2020, 1:42pm

I have a simple problem whereby I want to put a prior on the covariance matrix of multivariate normal, for that I will use a LKJ prior on the corresponding correlation matrix, and pre and post multiply by a vector of standard deviations, as discussed here and elsewhere. Specifically I will use the lkj_corr_cholesky prior the Cholesky factor of the correlation matrix.

Whenever I run this, even in simple toy problems, I get the same warnings: The largest R-hat is NA ..., and Bulk Effective Samples Size (ESS) is too low ..., and Tail Effective Samples Size (ESS) is too low ....

Here is an example of a simple model (adopted from code in this blog).

data {
  int<lower=1> N; 
  int<lower=1> J; 
  vector[J] y[N]; 
  vector[J] Zero;
}

parameters {
  cholesky_factor_corr[J] L_omega;  
  vector<lower=0>[J] sigma; 
}

model {
  y ~ multi_normal_cholesky(Zero, diag_pre_multiply(sigma, L_omega));
  sigma ~ cauchy(0, 5);
  L_omega ~ lkj_corr_cholesky(1.0);
}

Assuming that the above code is stored in a string in R named stancode, I can then run it with the following R code.

library(tidyverse)
library(rstan)

set.seed(10101)

J <- 3
N <- 100

Sigma <- rbind(c(1.58, 0.30, -0.40),
               c(0.30,  0.67, -1.04),
               c(-0.40, -1.04,  1.86)
)

Zero <- rep(0, J)

y <- mvtnorm::rmvnorm(N, Zero, Sigma)

M <- stan(model_code = stancode, 
          data = list(J = J, N = N, y = y, Zero = Zero))

The warnings I get here, which are typical of whenever I use the lkj_corr_cholesky are as follows:

Warning messages:
1: The largest R-hat is NA, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
2: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
3: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

Looking at the summary information for NAs, we can see the following, again which is typical.

summary(M)$summary %>%
  as_tibble(rownames = 'par') %>%
  filter_all(any_vars(is.na(.)))

# A tibble: 4 x 11
  par           mean se_mean    sd `2.5%` `25%` `50%` `75%` `97.5%` n_eff  Rhat
  <chr>        <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>
1 L_omega[1,1]     1     NaN     0      1     1     1     1       1   NaN   NaN
2 L_omega[1,2]     0     NaN     0      0     0     0     0       0   NaN   NaN
3 L_omega[1,3]     0     NaN     0      0     0     0     0       0   NaN   NaN
4 L_omega[2,3]     0     NaN     0      0     0     0     0       0   NaN   NaN

I presume that the warnings above can and should be ignored and are simply as a consequence of the fact that the Cholesky factor is a lower triangle matrix whose row vectors have unit length, and so the upper triangle is fixed at zero and the first element of the first row is fixed at 1.0. In other words, these elements have fixed values at 0 or 1 hence a sd of 0 etc.

However, the lkj_corr_cholesky prior is used routinely in Stan based multilevel models, such as those created by brms, and I never see these warnings.

More generally, assuming that these warnings are to be expected and are inconsequential, it feels like they are necessarily false alarms and so it seems like it would be better if they could be suppressed, though I don’t know of any way to do that using e.g. rstan::stan.

Max_Mantei · March 24, 2020, 3:12pm

Hey!

Yes!

Maybe just something like

suppressWarnings(
  M <- stan(model_code = stancode, 
            data = list(J = J, N = N, y = y, Zero = Zero))
  )

will do?

mark-andrews · March 24, 2020, 4:37pm

Thanks. I presume suppressWarnings would suppress all the other warnings too, i.e. those that I don’t want to ignore. On the other hand, as I said, brms::brm does not raise these warnings due to LKJ Cholesky priors, and does raise other warnings, so there must be some trick here. Just to clarify, if I do a brms::brm model that involves lkj_corr_cholesky as per normal, I get no warnings, but if extract the Stan code from brms and run that, then I get errors above.

For example

library(rstan)
library(brms)

standata <- make_standata(Reaction ~ Days + (Days|Subject), data = lme4::sleepstudy)
stancode <- make_stancode(Reaction ~ Days + (Days|Subject), data = lme4::sleepstudy)

# warnings like above occur here
M <- stan(model_code = stancode, data = standata)

# no warnings here, but same Stan model exactly
Mb <- brm(Reaction ~ Days + (Days|Subject), data = lme4::sleepstudy)

bgoodri · March 24, 2020, 6:08pm

I would also look at rstan::monitor(M) to make sure the bulk and tail ESS for all of the things that are actually random variables are not too low.

mark-andrews · March 24, 2020, 6:17pm

Thanks. ESS is fine.
Also, what’s interesting about rstan::monitor unlike rstan::summary is that I get no NaNs.

(From my toy model in original post).

Inference for the input samples (4 chains: each with iter = 2000; warmup = 0):

                 Q5    Q50    Q95   Mean  SD  Rhat Bulk_ESS Tail_ESS
L_omega[1,1]    1.0    1.0    1.0    1.0 0.0     1     4000     4000
L_omega[2,1]    0.1    0.2    0.4    0.2 0.1     1     2064     2150
L_omega[3,1]   -0.3   -0.2    0.0   -0.2 0.1     1     2011     2016
L_omega[1,2]    0.0    0.0    0.0    0.0 0.0     1     4000     4000
L_omega[2,2]    0.9    1.0    1.0    1.0 0.0     1     2051     2150
L_omega[3,2]   -0.9   -0.9   -0.9   -0.9 0.0     1     2163     2453
L_omega[1,3]    0.0    0.0    0.0    0.0 0.0     1     4000     4000
L_omega[2,3]    0.0    0.0    0.0    0.0 0.0     1     4000     4000
L_omega[3,3]    0.3    0.4    0.4    0.4 0.0     1     1986     2283
sigma[1]        1.2    1.4    1.5    1.4 0.1     1     3069     2811
sigma[2]        0.8    0.9    1.0    0.9 0.1     1     1751     1763
sigma[3]        1.3    1.5    1.7    1.5 0.1     1     1801     1916
lp__         -109.4 -105.8 -103.9 -106.1 1.7     1     1544     2439

Max_Mantei · March 24, 2020, 6:41pm

The values are just fixed to match the number of post warmup iterations. But clearly, these values are fixed, so any inference on them is kinda pointless.

Topic		Replies	Views
Lkj_corr_lpdf: Correlation matrix is not positive definite General	9	971	July 12, 2023
matrix[K,K] vs cholesky_factor_cov[K] Modeling matrix	2	1141	May 26, 2021
Modeling Hierarchical Regression with LKJ Prior Modeling	5	1193	August 23, 2018
Lkj(1) vs no prior on correlation matrix? Modeling	4	986	January 22, 2019
Covariance prior via lkj_corr_cholesky Modeling fitting-issues , covariance	4	520	December 21, 2023

Warning messages with `lkj_corr_cholesky` priors

Related topics