Hi,

**EDIT:** Looks like Aki posted while I was writing - heās definitely more knowledgeable about the topic than I am, so his advice should take precedence over mine.

This would IMHO depend a lot on how your priors for the āuselessā parameters look like. I did a quick check and it seems that empirically this is not completely accurate:

```
library(rstanarm)
set.seed(32156855)
dd <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10))
fit1 <- stan_glm(y ~ 1, data = dd)
fit2 <- stan_glm(y ~ 1 + x1, data = dd)
fit3 <- stan_glm(y ~ 1 + x1 + x2, data = dd)
loo1 <- loo(fit1, cores = 1)
loo2 <- loo(fit2, cores = 1)
loo3 <- loo(fit3, cores = 1)
loo_compare(loo1, loo2, loo3)
# elpd_diff se_diff
# fit1 0.0 0.0
# fit2 -0.7 1.1
# fit3 -1.5 1.6
```

The values above are pretty typical among multiple seeds, so `loo`

definitely can get worse by more than 0.5 per parameter. The difference can be made smaller by using tight priors centered on 0 for the coefficients. We can also make the difference arbitrarily worse by badly misspecifing priors for the parameters:

```
set.seed(235488)
dd <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10))
prior <- normal(location = 1, scale = 0.1)
fit1 <- stan_glm(y ~ 1, data = dd, prior = prior)
fit2 <- stan_glm(y ~ 1 + x1, data = dd, prior = prior)
fit3 <- stan_glm(y ~ 1 + x1 + x2, data = dd, prior = prior)
loo1 <- loo(fit1)
loo2 <- loo(fit2)
loo3 <- loo(fit3)
loo_compare(loo1, loo2, loo3)
# elpd_diff se_diff
# fit1 0.0 0.0
# fit2 -3.2 1.2
# fit3 -8.8 1.9
```

So I suspect that especially if your models are not very simple and you are not using tight priors, then the guarantees against large increase might be less strong than you expect.

I would repeat what was said in the original thread and generally caution against making binary decisions based on thresholds. If using one-tailed or two-tailed normal approximation makes a difference, a simple interpretation is that you donāt have enough data to make a clear decision. Remember that it is almost certain that all your models are at least slightly misspecified. So if your decision is sensitive to minor changes in the values of `elpd_diff / se_diff`

you are at very high risk of being misled. Looking at the result of `loo_model_weights`

might also be informative whether a clear decision is warranted (e.g. whether one model has weight close to 1)

Also note that `loo`

captures how good we would expect the model to be at predicting new data. Thatās a different goal than verifying whether āthere is an effect/interactionā - which would in many contexts IMHO be an ill-posed question and examining the posterior for the coefficient in the larger model might be more relevant to many scientific questions - my current thinking on the topic and some possible alternatives are at Hypothesis testing, model selection, model comparison - some thoughts

Iāll also add that I donāt think frequent bumping of unanswered topics is productive - we unfortunately usually have a backlog of questions that get unanswered for several days and activity may even take the question off radar for some people who specifically look for unanswered questions. Today I answered several questions that were left without reaction for longer than yours. (if a topic is abandoned for ~ longer than a week, then bumping may be sensible).