Comparing brms models with unstr() covariance versus cosy() and ar() with loo

I am trying out residual covariance structures in brms. I have fit unstr(), cosy(), and ar(p=1) models to a bunch of example datasets in the agridat package. I uniformly find that the unstr() models perform best in LOO cross-validation. This goes against my intuition because to me it seems like those models are really overparameterized. Their effective number of parameters is much much higher than for the cosy() and ar() models.

I know that the regularizing effect of priors means that overfitting is not as much of a problem in the Bayesian context. But I still find it hard to believe that the unstructured covariance models, which estimate a unique parameter for every off-diagonal element of the residual covariance matrix, are performing that much better. I was wondering if I am somehow not doing a legitimate comparison with loo_compare(). Or is my intuition that badly off? Thanks for any insight anyone can provide.

Example

Here is an example comparing models fit with brm() using loo_compare(). Note that it was necessary to include a slightly regularizing prior on the covariance matrix for the unstructured covariance model, as well as set init=0. The result of loo_compare() below shows that the unstructured covariance model is superior despite having an effective number of parameters p_loo of almost 900. In addition the posterior predictive check plot looks worse for the unstructured covariance model (see images at bottom).

Code

library(brms)
library(agridat)

data(broadbalk.wheat)
broadbalk.wheat$year <- broadbalk.wheat$year - 1852

options(mc.cores=4, brms.backend='cmdstanr')

broadbalk_unstructured <- brm(grain ~ year + unstr(time = year, gr = plot), 
                              prior = c(
                                prior(normal(0, 5), class = b),
                                prior(lkj_corr_cholesky(3), class = Lcortime)
                              ),
                              init = 0,
                              data = broadbalk.wheat, seed = 1041)
broadbalk_compoundsymm <- brm(grain ~ year + cosy(time = year, gr = plot), 
                              prior = c(
                                prior(normal(0, 5), class = b)
                              ),
                              data = broadbalk.wheat, seed = 1042)
broadbalk_ar1 <- brm(grain ~ year + ar(time = year, gr = plot, p = 1), 
                     prior = c(
                       prior(normal(0, 5), class = b)
                     ),
                     data = broadbalk.wheat, seed = 1043)

broadbalk_unstructured <- add_criterion(broadbalk_unstructured, 'loo') 
broadbalk_compoundsymm <- add_criterion(broadbalk_compoundsymm, 'loo')
broadbalk_ar1 <- add_criterion(broadbalk_ar1, 'loo')

print(loo_compare(broadbalk_unstructured, broadbalk_compoundsymm, broadbalk_ar1), simplify = FALSE)

Result

                       elpd_diff se_diff elpd_loo se_elpd_loo p_loo   se_p_loo looic   se_looic
broadbalk_unstructured     0.0       0.0   408.7      9.0       889.3     8.0   -817.5    18.0 
broadbalk_compoundsymm -1506.2      24.7 -1097.5     26.3         2.2     0.1   2195.0    52.5 
broadbalk_ar1          -1655.4      24.8 -1246.7     25.2         3.9     0.2   2493.4    50.4 

Posterior predictive check plot: unstructured model

PP check plot: cosy model

PP check plot: AR1 model

1 Like

Did you get Pareto-k warnings? Can you show the loo output separately for the unstructured model (including the Pareto-k diagnostic)

Thanks for the response. Yes, I do get Pareto k warnings for the unstructured model. The other two say that all Pareto k estimates are k < 0.7.

Here is the full LOO output for the unstructured model.

> loo(broadbalk_unstructured)

Computed from 4000 by 1258 log-likelihood matrix.

         Estimate   SE
elpd_loo    408.7  9.0
p_loo       889.3  8.0
looic      -817.5 18.0
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 0.9]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     255   20.3%   105     
   (0.7, 1]   (bad)      924   73.4%   <NA>    
   (1, Inf)   (very bad)  79    6.3%   <NA>    
See help('pareto-k-diagnostic') for details.

So I guess this means the LOO is unreliable for this model? I guess I am not surprised because I am only trying to fit this model, which I know is not really appropriate, for pedagogical reasons. But the LOO results are counterintuitive and difficult to explain, at least to me. I would just be interested in a good explanation. Thanks!

By the way, I tried to set moment_match = TRUE as recommended, but after it recompiled the model with rstan I got an error:

broadbalk_unstructured <- add_criterion(broadbalk_unstructured, "loo",overwrite = TRUE, moment_match = TRUE)
Recompiling the model with 'rstan'
Recompilation done
Error : Exception: variable does not exist; processing stage=parameter initialization; variable name=Lcortime; base type=double (in 'string', line 385, column 2 to column 44)
Error: Moment matching failed. Perhaps you did not set 'save_pars = save_pars(all = TRUE)' when fitting your model? If you are running moment matching on another machine than the one used to fit the model, you may need to set recompile = TRUE.

I’ve had this issue before with brms models fit with cmdstanr. The suggestion to set recompile = TRUE does not seem helpful because it looks like it already tried to recompile.

You have to separate how the data is divided (LOO) and what is the computational method (in this case PSIS). See more in CV-FAQ: 3 What are the parts of cross-validation?. It is likely that LOO is reliable, but in this case it’s clear that the importance-sampling computation is failing as indicated by Pareto-k diagnostic.

With 80% of bad or very bad Pareto-k values the correct explanation is that the estimate is not useful and should not be used in model comparison.

Your unstructured model has one parameter per each observation. When you remove the only observation providing information via likelihood for that parameter, the leave-one-out posterior is so different that importance sampling fails. In this case, it is also likely that the moment matching fails. This is similar to varying intercept model in Roaches case study section 4. It is likely that you need to use integrated-LOO approach as discussed in Roaches case stud Section 5. Integrated-LOO is not yet supported in brms, but will at some point in the future. Meanwhile, you can implement it in Stan code or in R. Alternatively you can use K-fold-CV.

We have also a plan to add more warnings and pointers to CV-FAQ in loo_compare() output, so that, e.g., in you example case it would be clear that the comparison is not sensible.

3 Likes

This makes a lot of sense! Thanks for the helpful explanation.

As it turns out, the k-fold cross-validation still shows very similar results to the LOO cross-validation for this case. So maybe the unstructured covariance model, despite having so many parameters, is really just flexible enough that it works in this case:

broadbalk_kfold <- kfold(broadbalk_unstructured, broadbalk_compoundsymm, broadbalk_ar1, folds = 'grouped', group = 'plot', K = 5, compare = TRUE)

Result:

Output of model 'broadbalk_unstructured':

Based on 5-fold cross-validation.

           Estimate   SE
elpd_kfold   -669.6 16.6
p_kfold      1967.6 16.1
kfoldic      1339.2 33.3

Output of model 'broadbalk_compoundsymm':

Based on 5-fold cross-validation.

           Estimate   SE
elpd_kfold  -1105.6 26.5
p_kfold        10.3  1.6
kfoldic      2211.1 53.0

Output of model 'broadbalk_ar1':

Based on 5-fold cross-validation.

           Estimate   SE
elpd_kfold  -1280.1 26.4
p_kfold        37.3  3.5
kfoldic      2560.3 52.8

Model comparisons:
                       elpd_diff se_diff
broadbalk_unstructured    0.0       0.0 
broadbalk_compoundsymm -436.0      27.6 
broadbalk_ar1          -610.5      29.8