Hello all,

I have been making my first foray into using brms for mixed effects models and model comparisons, but have been running into a theoretical problem that I can’t quite wrap my head around. I was initially using `bayes_factor`

to compare different models, but after repeated convergence warnings (even after substantially increasing the posterior samples), someone recommended I use ELPD instead. Sure enough, using `loo_compare`

I do not get any warnings, but quite oddly and unexpectedly it contradicts the Bayes factor results. Now of course that comparison gave warnings, so it might be that it was simply wrong — but I am not convinced that this is true, since the Bayes factor values were really consistently enormous, and could be replicated through a Bayes factor approximation by using the BIC on a non-Bayesian `lmer`

model. Furthermore, not all of these models gave warnings, and even those that did not contradicted the ELPD results.

Thus, another explanation may be that the Bayes factor comparison and the ELPD comparison simply give different results. I understand that the two measure different things, and so that this is theoretically possible. However, I do not fully understand what the implications of this would be for my results. Under what circumstances would one expect the two measures to give opposite results? Intuitively, it feels like this may be related to overfitting, and that a model that allows for more overfitting will perform better on the Bayes factor comparison, but not necessarily on the ELPD comparison. Am I thinking in the right direction? Does anyone perhaps have a simple example of a scenario in which this may occur?

Since this is more of a theoretical question, I figured that the exact technical details may not be crucial. If you have a purely theoretical answer, I’m more than interested in hearing it, and you wouldn’t need to bother with the specifics of my model. However, I’ll try to mention some details that may be relevant below. If you need more information to properly answer this question, do let me know!

In short, I am analysing experimental data from a behavioural experiment, in which there are different cue types (`cue`

). For each stimulus (`item`

) within the different cue types, I have obtained descriptive scores through a norming experiment (`norming`

). I am trying to answer the question whether these norming scores can explain a difference in reaction times (`RT`

) that we found, or whether the cue type is required (and possibly even sufficient) to explain this effect.

The relevant parts of the model specifications:

```
cue_formula <- brmsformula(
RT ~ cue * matching + (...) + (cue + matching | participant) + (cue + matching | item),
family=gaussian(link='log')
)
norming_formula <- brmsformula(
RT ~ norming * matching + (...) + (norming + matching | participant) + (norming + matching | item),
family=gaussian(link='log')
)
combined_formula <- brmsformula(
RT ~ cue * norming * matching + (...) + (cue + norming + matching | participant) + (cue + norming + matching | item),
family=gaussian(link='log')
)
```

Priors (weakly informative, with the RT prior based on values from an earlier study):

```
priors <- c(
prior(normal(6.5, 0.5), class="Intercept"),
prior(normal(0, 1), class="b"),
prior(cauchy(0, 5), class="sd")
)
```

Models run with:

```
norming_model <- brm(norming_formula, data=data, prior=priors, warmup=5000, iter=105000, chains=10, cores=10, save_all_pars=TRUE)
cue_model <- brm(cue_formula, data=data, prior=priors, warmup=5000, iter=105000, chains=10, cores=10, save_all_pars=TRUE)
combined_model <- brm(combined_formula, data=data, prior=priors, warmup=5000, iter=105000, chains=10, cores=10, save_all_pars=TRUE)
```

Bayes factor comparisons (different variable names):

```
bayes_factor(mc_combined, mc_cue_only)
bayes_factor(mc_norming_only, mc_combined)
```

Example results:

```
Estimated Bayes factor in favor of mc_combined over mc_cue_only: 10369895906583968793856311296.00000
Estimated Bayes factor in favor of mc_norming_only over mc_combined: 519246339874.26733
```

Warning message (which I get on some, but not all, of the models):

```
Warning message:
logml could not be estimated within maxiter, rerunning with adjusted starting value.
Estimate might be more variable than usual.
```

ELPD-LOO comparisons:

```
mc_cue_only <- add_criterion(mc_cue_only, "loo", ndraws=5000, cores=12)
mc_norming_only <- add_criterion(mc_norming_only, "loo", ndraws=5000, cores=12)
mc_combined <- add_criterion(mc_combined, "loo", ndraws=5000, cores=12)
loo_compare(mc_cue_only, mc_norming_only, mc_combined)
```

Example output:

```
elpd_diff se_diff
mc_combined 0.0 0.0
mc_cue_only -7.2 21.0
mc_norming_only -112.5 24.7
```

Setup information (although not everything was run on the same system, and so versions may also differ slightly):

- Operating System: Windows 10 (19043.1466)
- R version: 4.1.1
- brms Version: 2.16.1