Question about large Pareto k value

Hi,

I got two problematic k values when using the LOO package. See the output as below:

> r_eff_ML <- relative_eff(exp(GMM_ML_fit_4c$draws("log_lik")), cores = 16)
> loo_waic_ML<-waic(GMM_ML_fit_4c$draws("log_lik"))
Warning message:

12 (3.0%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
> loo_ML <- loo(GMM_ML_fit_4c$draws("log_lik"), r_eff = r_eff_ML, cores = 16)
Warning message:
Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
 
> print(loo_waic_ML)

Computed from 6000 by 405 log-likelihood matrix

          Estimate   SE
elpd_waic  -1545.6 40.6
p_waic        32.5  5.4
waic        3091.3 81.1

12 (3.0%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
> print(loo_ML)

Computed from 6000 by 405 log-likelihood matrix

         Estimate   SE
elpd_loo  -1545.6 40.5
p_loo        32.5  5.3
looic      3091.2 80.9
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     403   99.5%   3         
 (0.5, 0.7]   (ok)         0    0.0%   <NA>      
   (0.7, 1]   (bad)        2    0.5%   61        
   (1, Inf)   (very bad)   0    0.0%   <NA>      
See help('pareto-k-diagnostic') for details.

See the PSIS diagnostic plot below:

Because the number is small, I tried to “refit the model once for each of these problematic observations”, which is recommended from this vignette (Using the loo package (version >= 2.0.0)). My code is as follows:

> if (any(pareto_k_values(loo_ML) > 0.7)) {
+     loo_ML <- loo(GMM_ML_fit_4c$draws("log_lik"), save_psis = TRUE, k_threshold = 0.7)
+ }
Warning messages:
1: Relative effective sample sizes ('r_eff' argument) not specified.
For models fit with MCMC, the reported PSIS effective sample sizes and 
MCSE estimates will be over-optimistic. 
2: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
 
> print(loo_ML)

Computed from 6000 by 405 log-likelihood matrix

         Estimate   SE
elpd_loo  -1545.6 40.5
p_loo        32.4  5.3
looic      3091.1 80.9
------
Monte Carlo SE of elpd_loo is NA.

Pareto k diagnostic values:
                         Count Pct.    Min. n_eff
(-Inf, 0.5]   (good)     403   99.5%   821       
 (0.5, 0.7]   (ok)         1    0.2%   211       
   (0.7, 1]   (bad)        1    0.2%   531       
   (1, Inf)   (very bad)   0    0.0%   <NA>      
See help('pareto-k-diagnostic') for details.

The question is why I still got two problematic k values. Did I do something wrong? Does anyone have suggestions?

Thank you so much for helping me!

Best,
Doria

You should use the reloo = TRUE option for loo here. This will refit your model once per high pareto-k value, leaving that observation out. This is what is meant with refitting the model.
You then have to replace the respective pointwise entries in the initial loo object with those from the refit models.

I think the brms loo function has a reloo option for brmsfit objects, you could look at.

How do you create GMM_ML_fit_4c? You call loo() like it’s not from rstanarm or brms. Looking now at the documentation you pointed to, I see that it’s not clear enough that k_threshold option is supported only for rstanarm objects. @scholz suggested using reloo option, but that is supported only for brms objects. In case of rstanarm and brms created models, there is enough information that the refits can be created automatically. If you have created your directly in Stan model code, and using cmdstanr then these options don’t work, but also unfortunately don’t produce an error. Thus, although you had option k_threshold, it didn’t do refits, and thus there are still 2 high khats. It seems they are very close to the threshold, and one of them being even below the 0.7 threshold with repeated sampling (khat is estimated from the posterior draws, and refitting is likely to change the estimate), it’s likely that the elpd_loo estimate would not change much if you would do the refits. It’s also a good sign that p_loo is relatively small compared to the number of observations. You could also look at the actual observation values, whether it makes sense that why they are more influential than others.

Thank you for answering my question.

Yes, I tried to turn on the reloo option, and like @avehtari’s thread below, that option only works for brms. Therefore, it did not change much when I used reloo.

Thank you so much for your time.

GMM_ML_fit_4c is a cmdstanr object using the following code,

  GMM_ML_fit_4c <- GMM_ML_mod$sample(
    data = GMMs_data_list(GMM_dat = GMM_dat, K=4),
    seed = sample.int(1000, 1),
    chains = 5,
    parallel_chains = 5,
    iter_sampling = 1000,
    refresh = 1000)

This is good to hear.

Yes, the p_loo is smaller than the number of observations (N=405), but it is greater than the number of parameters (p=24). Does that mean my model is misspecified?

Yes, these two observations are more influential. See the screenshot below:

My question now is, what should I do next?

Thank you again for your answer!

Best,
Doria

Probably

Can you share the model code? I may able say then more

Hi Vehtari,

Thank you so much for helping me! I’m not ready to share the code.

I intended to do class enumeration in the Growth Mixture Model (GMM). I write GMM with the marginal likelihood (marginal over both latent class variable and random effects) and pass the marginal likelihood to the loo() function. I fit GMM with 1-, 2-, 3-, and 4-class GMMs and compare their elpd_waic and elpd_loo. No high Pareto k values were found for the 1-, 2-, and 3-class GMMs, but two high k values for the 4-class GMM were shown in the first thread. Moreover, the 4-class GMM is the “best” fit model. See below for the loo() function results for 3- and 4-class GMMs:

3-class GMM solution:


4-class GMM solution:

Could you say more from the information provided? Thank you again for your help!

Best,
Doria