Dear all, I have two competing models based on sampling healthcare facilities.

The models are binomial and try to model vaccine effectiveness for some disease.

For each hospital facility (~400) I have the number of cases and the denominators.

I’m evaluating two competing models, one (mod1) using a random intercept for the interfacility variation and one (mod2) using a beta-binomial model to account for the facility-related overdispersion.

apparently model 1 performs better:

```
brms::loo_compare(brms::loo(mod1), brms::loo(mod2))
elpd_diff se_diff
mod1 0.0 0.0
mod2 -89.5 63.2
```

But the first model has a lot of observations with high Pareto_k:

```
> brms::loo(mod1)
Computed from 15000 by 746 log-likelihood matrix.
Estimate SE
elpd_loo -1808.8 71.7
p_loo 498.0 36.0
looic 3617.7 143.4
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 2.3]).
Pareto k diagnostic values:
Count Pct. Min. ESS
(-Inf, 0.7] (good) 384 51.5% 130
(0.7, 1] (bad) 306 41.0% <NA>
(1, Inf) (very bad) 56 7.5% <NA>
```

while the second model had almost none:

```
> brms::loo(mod2)
Computed from 15000 by 746 log-likelihood matrix.
Estimate SE
elpd_loo -1898.4 42.6
p_loo 34.0 2.1
looic 3796.7 85.2
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.7]).
Pareto k diagnostic values:
Count Pct. Min. ESS
(-Inf, 0.7] (good) 745 99.9% 3525
(0.7, 1] (bad) 1 0.1% <NA>
(1, Inf) (very bad) 0 0.0% <NA>
```

Could it be that the first model is showing too much sensibility to individual observetations, i.e. the better performance is due to overfitting?

The results are similar:

```
# model 1
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 ((1-plogis(Vaccin... > 0 0.64 0.1 0.46 0.79 Inf 1 *
# model 2
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 ((1-plogis(Vaccin... > 0 0.75 0.13 0.52 0.92 14999 1 *
```

with the second model showing higher vaccine effectiveness values and a tighter credibility interval but also higher Est.Error

What reasoning should drive me in the choice between the two models?