Pareto k values are too high when running model comparison

Hi there,

I am trying to run a model comparison using the brms function loo_model_weights(). My models are GAM and I have tried both stacking and pseudo-BMA with the same error for output.

Warning messages:
1: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
2: In log(z) : NaNs produced
3: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details
4: In log(z) : NaNs produced
5: In log(z) : NaNs produced

However, this output does not tell me where the error is coming from or what the actual values are. Does anybody know how to deal with this?
Thanks in advance,

1 Like

Can you run just, e.g.

(loo1 <- loo(fit1))

and post the full printed output?

Hi avehtari, thanks for your reply. This is what I get:

Warning message:
Found 18 observations with a pareto_k > 0.7 in model 'fit'. With this many problematic observations, it may be more appropriate to use 'kfold' with argument 'K = 10' to perform 10-fold cross-validation rather than LOO.

Warning message:
system call failed: Cannot allocate memory

However, I previously tried to run a k-fold and I got the following error:

Fitting model 1 out of 10
Start sampling
Error: passing unknown arguments: groups.
Execution halted

I don’t understand how this unknown passing unknown arguments: groups error is thrown when performing a k-fold but not when I try stacking or any other method.

You are running out of memory. Try with cores=1

Was that all what was printed?

You need to show the code you are using, so that we have a chance to understand

Yes, that’s all that was printed.
Now I ran
loo1 <- brms::loo(fit,cores=1)
and I got the same warnings and errors as above:
Warning message:
Found 18 observations with a pareto_k > 0.7 in model 'fit'. With this many problematic observations, it may be more appropriate to use 'kfold' with argument 'K = 10' to perform 10-fold cross-validation rather than LOO.

Warning message:
system call failed: Cannot allocate memory

For the K-fold, I ran this:
kfold_full_model <- brms::kfold(fit,groups = 'Stimulus', k=10)

Let me know if you need anything else and thanks for the help!

Oh, that’s strange. I think there should be error message instead of warning if the usual output is not shown.

Please run
(loo1 <- brms::loo(fit,cores=1))
so you should see the usual output also. You should see something like

Computed from 4000 by 262 log-likelihood matrix

         Estimate     SE
elpd_loo  -6238.4  725.8
p_loo       277.4   67.1
looic     12476.8 1451.6
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     241   92.0%   149       
 (0.5, 0.7]   (ok)         9    3.4%   48        
   (0.7, 1]   (bad)        4    1.5%   12        
   (1, Inf)   (very bad)   8    3.1%   1         
See help('pareto-k-diagnostic') for details.

If you don’t get this output then report the number of post warmup iterations, the number of chains, and the number of observations.

Please report OS version, sessionInfo() and how much memory you have

based on ?brms::kfold there is no argument groups, but there is an argument group

this is the output after running (loo1 <- brms::loo(fit,cores=1)):

Computed from 16000 by 10128 log-likelihood matrix
Estimate SE
elpd_loo -17748.9 276.4
p_loo 1777.4 40.7
looic 35497.7 552.8

Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 10021 98.9% 545
(0.5, 0.7] (ok) 89 0.9% 191
(0.7, 1] (bad) 17 0.2% 20
(1, Inf) (very bad) 1 0.0% 7
See help('pareto-k-diagnostic') for details.
Warning message:
Found 18 observations with a pareto_k > 0.7 in model 'fit'. With this many problematic observations, it may be more appropriate to use 'kfold' with argument 'K = 10' to perform 10-fold cross-validation rather than LOO.
Error: cannot allocate vector of size 1.2 Gb
Execution halted
Warning message:
system call failed: Cannot allocate memory


The OS version (I’m running R on a server):
~$ hostnamectl
Operating System: Ubuntu 18.04.1 LTS
Kernel: Linux 4.15.0-34-generic
Architecture: x86-64


here’s the output to sessionInfo():
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_3.5.1


and the memory on my server:
~$ free -m
total used free shared buff/cache available
Mem: 30G 191M 30G 688K 122M 30G
Swap: 0B 0B 0B

Great, there is other output and not just the warning.

Did you use a single ticks around that output in your post? Three ticks shouls keep the columns aligned for easier reading.

Computed from 16000 by 10128 log-likelihood matrix

That’s more draws and observations than usual, but should not be a problem with 30GB of memory. We can also now see Error: cannot allocate vector of size 1.2 Gb, which is clearly something that should be ok with 30GB of memory. I now checked that ?brms::loo does not mention cores argument, so it is possible that it’s ignored. You could try running

options(mc.cores = 1)

before running loo. If you still get memory error then we can ask @paul.buerkner.

I expected to see brms and loo versions also in sessionInfo. Can you report those versions, too.

I would think that it is quite likely a probelm for 30 GB depending on what kind of operations are required and what else you have in memory. Also, the error Error: cannot allocate vector of size 1.2 Gb only tells you how much access bytes cannot be allocated. So above the allocatable memory, there were 1.2Gb that could not be allocated.

You could try setting pointwise = TRUE in loo to reduce memorary requirements.

1 Like

I have been running this for 3 days now and it’s still running without errors

options(mc.cores = 1)
(loo1 <- brms::loo(fit,cores=1,pointwise=TRUE))

If the cores=1 argument inside loo is ignored then I’m guessing is still harmless, but is it normal and expected that it’s currently taking so long?

Considering you have 10128 observations and GAMs it can take some time, but without knowing more about your model, I can’t say whether this is long time or not. More you tell about your model and data, easier it is to give concrete suggestions.

Generic suggestions