Pareto values versus ELPD differences

JLC · June 14, 2021, 11:21pm

I am comparing two models with slightly different parameterizations using loo and am wondering how to interpret the results when the model with higher ELPD also has more elevated Pareto k values.

Output of model 'Model_1':

Computed from 4000 by 6299 log-likelihood matrix

         Estimate    SE
elpd_loo -14585.8  84.9
p_loo      2215.2  29.9
looic     29171.6 169.8
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     6231  98.9%   41        
 (0.5, 0.7]   (ok)         63   1.0%   27        
   (0.7, 1]   (bad)         5   0.1%   30        
   (1, Inf)   (very bad)    0   0.0%   <NA>      
See help('pareto-k-diagnostic') for details.

Output of model 'Model_2':

Computed from 4000 by 6299 log-likelihood matrix

         Estimate    SE
elpd_loo -14131.4  87.7
p_loo      2740.7  32.6
looic     28262.8 175.5
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     6113  97.0%   69        
 (0.5, 0.7]   (ok)        171   2.7%   33        
   (0.7, 1]   (bad)        14   0.2%   25        
   (1, Inf)   (very bad)    1   0.0%   23        
See help('pareto-k-diagnostic') for details.

Model comparisons:
                elpd_diff se_diff
Model_2    0.0       0.0 
Model_1 -454.4      39.7 
Warning messages:
1: Found 5 observations with a pareto_k > 0.7 in model 'Model_1'. It is recommended to set 'moment_match = TRUE' in order to perform moment matching for problematic observations.  
2: Found 15 observations with a pareto_k > 0.7 in model 'Model_2'. It is recommended to set 'moment_match = TRUE' in order to perform moment matching for problematic observations.

Does this suggest both models are mis-specified and neither should be used? Or is Model_2 still a better fit despite the high k’s?

jsocolar · June 14, 2021, 11:49pm

@avehtari could provide a better answer than I, but in general high Pareto k values in either model mean that you shouldn’t trust LOO’s model comparison. But it is possible for either or both models to be properly specified despite high Pareto k values. You might find this post helpful:

JLC · June 15, 2021, 12:48am

Thank you!

Is ‘length(get_variables(model))’ ( from tidybayes) a reasonable approach to estimating the number of parameters?

jsocolar · June 15, 2021, 2:04am

I don’t know tidybayes well enough to answer authoritatively. If you are saving any transformed parameters or generated quantities in your model, make sure not to count them.

avehtari · June 15, 2021, 7:23am

This post is also now part of the loo documentation LOO package glossary — loo-glossary • loo

Without not yet knowing the number of parameters, just knowing that p_loo is 35-43% from the number of observations we can infer that it’s likely that the model is flexible. If p_loo is higher than the number of parameters then the model is certainly misspecified, but if p_loo is lower than the number of parameters I would expect that both models are flexible, which could mean, e.g. a hierarchical model with group specific parameters but not many observations for some groups. If p_loo is less than the number of the parameters and other model checking diagnostics don’t indicate bad model misspecification then the difference between the models is that big that you can say that the model 2 has better predictive performance.

Check also my yesterday post Model selection of nonlinear flexible hierarchical model with loo - #2 by avehtari

JLC · June 15, 2021, 11:17am

Thank you both, @avehtari and @jsocolar !

Using the output from get_variables, it looks like the models are estimating ~31,000 parameters.

They are hierarchical models of 250 subjects, over time, with several categorical and continuous predictors per-side (left and right), with varying slopes and intercepts as well as nu and sigma parameters of a student_t distribution.

Topic		Replies	Views
High Pareto-k values for the same observations across different models: Can I still use loo to compare these models? Modeling loo	2	592	November 5, 2018
Interpret pareto k diagnostic Modeling rstan , fitting-issues , loo	3	1787	August 3, 2023
Various questions about interpretation of loo results General loo , interpret-results	2	1438	August 1, 2019
Loo package General loo	16	1679	February 27, 2019
Problems in model comparsion with loo-package for a self-written stan model with explanatory variables and hierarchy Modeling specification , loo	6	481	September 2, 2020

Pareto values versus ELPD differences

Related topics