One or two-tailed normal approximation for elpd_loo?

martinmodrak · April 7, 2021, 7:46pm

Hi,

EDIT: Looks like Aki posted while I was writing - he’s definitely more knowledgeable about the topic than I am, so his advice should take precedence over mine.

This would IMHO depend a lot on how your priors for the “useless” parameters look like. I did a quick check and it seems that empirically this is not completely accurate:

library(rstanarm)
set.seed(32156855)
dd <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10))
fit1 <- stan_glm(y ~ 1, data = dd)
fit2 <- stan_glm(y ~ 1 + x1, data = dd)
fit3 <- stan_glm(y ~ 1 + x1 + x2, data = dd)

loo1 <- loo(fit1, cores = 1)
loo2 <- loo(fit2, cores = 1)
loo3 <- loo(fit3, cores = 1)

loo_compare(loo1, loo2, loo3)

#     elpd_diff se_diff
# fit1  0.0       0.0   
# fit2 -0.7       1.1   
# fit3 -1.5       1.6

The values above are pretty typical among multiple seeds, so loo definitely can get worse by more than 0.5 per parameter. The difference can be made smaller by using tight priors centered on 0 for the coefficients. We can also make the difference arbitrarily worse by badly misspecifing priors for the parameters:

set.seed(235488)
dd <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10))
prior <- normal(location = 1, scale = 0.1)
fit1 <- stan_glm(y ~ 1, data = dd, prior = prior)
fit2 <- stan_glm(y ~ 1 + x1, data = dd, prior = prior)
fit3 <- stan_glm(y ~ 1 + x1 + x2, data = dd, prior = prior)

loo1 <- loo(fit1)
loo2 <- loo(fit2)
loo3 <- loo(fit3)

loo_compare(loo1, loo2, loo3)
#      elpd_diff se_diff
# fit1  0.0       0.0   
# fit2 -3.2       1.2   
# fit3 -8.8       1.9

So I suspect that especially if your models are not very simple and you are not using tight priors, then the guarantees against large increase might be less strong than you expect.

I would repeat what was said in the original thread and generally caution against making binary decisions based on thresholds. If using one-tailed or two-tailed normal approximation makes a difference, a simple interpretation is that you don’t have enough data to make a clear decision. Remember that it is almost certain that all your models are at least slightly misspecified. So if your decision is sensitive to minor changes in the values of elpd_diff / se_diff you are at very high risk of being misled. Looking at the result of loo_model_weights might also be informative whether a clear decision is warranted (e.g. whether one model has weight close to 1)

Also note that loo captures how good we would expect the model to be at predicting new data. That’s a different goal than verifying whether “there is an effect/interaction” - which would in many contexts IMHO be an ill-posed question and examining the posterior for the coefficient in the larger model might be more relevant to many scientific questions - my current thinking on the topic and some possible alternatives are at Hypothesis testing, model selection, model comparison - some thoughts

I’ll also add that I don’t think frequent bumping of unanswered topics is productive - we unfortunately usually have a backlog of questions that get unanswered for several days and activity may even take the question off radar for some people who specifically look for unanswered questions. Today I answered several questions that were left without reaction for longer than yours. (if a topic is abandoned for ~ longer than a week, then bumping may be sensible).

Topic		Replies	Views
Interpreting elpd_diff - loo package Modeling loo , interpret-results	47	14629	November 9, 2020
If elpd_diff/se_diff > \|2\|, is this noteworthy? brms techniques , loo , cross-validation	21	4010	April 3, 2021
Presenting small elpd_diffs based on multiple fits of the same models Modeling loo	22	1111	October 9, 2023
Loo comparison in reference to standard error General loo	10	3069	May 1, 2018
Sivula, Magnusson, Matamoros & Vehtari (uncertainty in elpd_loo comparison): newbie questions Modeling loo	12	491	August 8, 2023

One or two-tailed normal approximation for elpd_loo?

Related topics