Hello,

I am fitting binomial models with brms that are yielding Pareto k values > 0.5. I then tried zero_inflated_binomial and beta_binomial distributions and these models yielded fewer Pareto k values >5 but when I compared the models with kfold as the package suggested, the models with more bad Pareto k values had lower kfold ic values. Can I trust models with so many Pareto k warnings? Should I trust the kfold ic comparison despite it favoring models with many more Pareto k warnings? Also, none of the loo_pit diagnostic plots look particularly good to me, but I would appreciate some help in interpretation. Iâ€™ve given summaries of my 3 models below.

The data consists of 1547 observations of X successes out of 50 or 100 total trials. Each model has 499-500 parameters. I can give more info on the models if it helps but Iâ€™m hoping for general advice on how to deal with these high pareto k values. I tried following the advice on other threads (here and here) but different response distributions didnâ€™t help much.

**Binomial**

```
loo(mBinomial)
#> Computed from 5700 by 1547 log-likelihood matrix
#> Estimate SE
#> elpd_loo -4026.2 62.0
#> p_loo 545.5 24.3
#> looic 8052.5 124.1
#> ------
#> Monte Carlo SE of elpd_loo is NA.
#> Pareto k diagnostic values:
#> Count Pct. Min. n_eff
#> (-Inf, 0.5] (good) 1351 87.3% 430
#> (0.5, 0.7] (ok) 157 10.1% 63
#> (0.7, 1] (bad) 30 1.9% 18
#> (1, Inf) (very bad) 9 0.6% 3
#> See help('pareto-k-diagnostic') for details.
kfold(mBinomial)
#> Based on 10-fold cross-validation
#> Estimate SE
#> elpd_kfold -4058.8 61.6
#> p_kfold NA NA
#> kfoldic 8117.6 123.2
```

**Zero_inflated_binomial**

```
loo(mzBinomial)
#> Computed from 5700 by 1547 log-likelihood matrix
#> Estimate SE
#> elpd_loo -4026.2 61.6
#> p_loo 541.2 24.8
#> looic 8052.3 123.2
#> ------
#> Monte Carlo SE of elpd_loo is NA.
#> Pareto k diagnostic values:
#> Count Pct. Min. n_eff
#> (-Inf, 0.5] (good) 1321 85.4% 289
#> (0.5, 0.7] (ok) 174 11.2% 138
#> (0.7, 1] (bad) 45 2.9% 20
#> (1, Inf) (very bad) 7 0.5% 3
#> See help('pareto-k-diagnostic') for details.
kfold(mzBinomial)
#> Based on 10-fold cross-validation
#> Estimate SE
#> elpd_kfold -4019.6 57.7
#> p_kfold NA NA
#> kfoldic 8039.2 115.4
```

**Beta-Binomial**

```
loo(mbBinomial)
#> Computed from 5700 by 1547 log-likelihood matrix
#> Estimate SE
#> elpd_loo -3892.7 43.2
#> p_loo 320.2 12.6
#> looic 7785.4 86.3
#> ------
#> Monte Carlo SE of elpd_loo is NA.
#> Pareto k diagnostic values:
#> Count Pct. Min. n_eff
#> (-Inf, 0.5] (good) 1430 92.4% 358
#> (0.5, 0.7] (ok) 103 6.7% 110
#> (0.7, 1] (bad) 13 0.8% 27
#> (1, Inf) (very bad) 1 0.1% 13
#> See help('pareto-k-diagnostic') for details.
kfold(mbBinomial)
#> Based on 10-fold cross-validation
#> Estimate SE
#> elpd_kfold -4251.4 47.0
#> p_kfold NA NA
#> kfoldic 8502.8 94.0
```

Thank you very much,

Sam