Sivula, Magnusson, Matamoros & Vehtari (uncertainty in elpd_loo comparison): newbie questions

blokeman · July 17, 2023, 6:58am

I’m trying to understand key aspects of this paper without a stats/math degree, so I hope pedagogically minded members can chip in on my dumb questions. Here’s the first one:

Figure 4 (p. 17) shows a different scale for the first and second column, which represent estimated vs. true elpd_diff, respectively. Both the caption and the y-axis label suggest that this axis represents the mean of elpd_diff divided by its SD (“relative mean”) over simulated iterations. But it doesn’t make sense to me that true vs estimated elpd_diff could differ 30-fold. What’s the reason for the scale difference? Is it because in a simulation setting, true elpd_diff can be exactly calculated and therefore has no SD to divide by?

But on the same page, the text reads “When \beta_\Delta \neq 0, the relative mean of both
|elpd| and |elpd_loo| grows infinitely.” What am I missing? Is it just a detail assumed to be so self-evident that the authors prefer to save space by not spelling it out?

Also, am I interpreting the top-left panel of the plot correctly, i.e. that when true elpd_diff is 0, its estimator will always subtly favor the simpler model even when the sample size is large?

jsocolar · July 17, 2023, 2:48pm

It’s because the standard deviation in the estimator is much larger than the standard deviation in the true value. There’s still variation in the true value because the elpd depends on the dataset that gets drawn.

This whole thing is in a section about the asymptotic behavior as n gets large. The indefinite growth is what happens in the large-n limit.

The true elpd_diff isn’t zero, but I think you’re getting burned by the scaling of the y-axis.

blokeman · July 17, 2023, 3:32pm

Thanks!

Oh? You mean the metrics are equal for the blue line, but the gap looks smaller on the left because of the smaller scale?

jsocolar · July 17, 2023, 3:37pm

If I understand which line you’re wondering about, the gap looks smaller on the right, not the left.

The gaps on the left and right aren’t literally equal, because of the different sds. But that zero-coefficient line should be positive on the right hand side. The true model has the coefficient set to zero, so the model that constrains the coefficeint to zero should on average provide better out-of-sample prediction than a model that does not constrain the coefficient to zero.

blokeman · July 17, 2023, 3:40pm

Whoops right, a slip of the pen on my part.

Makes sense, thanks!

blokeman · July 17, 2023, 4:51pm

Figure 13 on p. 27: What do the lowercase letters in the panel titles mean? Experiment 1b, 1c etc.

The descriptions of the 6 experimental settings on p.22–25 don’t seem to mention any “subtypes” of the six settings. And the description of the figure in the main text just says that it “compares the normal uncertainty approximation for data size n = 128, with a non-shared covariate effect \beta_{\Delta} = 0.5.” I’ve thus far failed to find any gloss or explanation for what the small letters mean.

jsocolar · July 17, 2023, 4:59pm

looks like a mis-labeling of the figure, but maybe @avehtari knows better.

avehtari · July 25, 2023, 12:00pm

Yes. The labelling of the experiments was simplified at some point to be 1-6, but we forgot to update this figure. Thanks for mentioning this! Ping @mans_magnusson

blokeman · July 25, 2023, 3:35pm

Since we got into the business of pointing out possible errata, I was somewhat confused by the fact that the rightmost definitions of \text{elpd(M}_k|y) seem to differ between Equation 1 (p. 3) and the Notation table (p. 5). The former uses \tilde{y}_i, the latter y_i.

It also looks to me like Equation 1 writes p_{\text{M}_{k}} when it means p_k. Overall, there’s a lot of vacillation all through the article on whether the _\text{M} in p_{\text{M}_{k}} is present or absent. Page 5 suggests to me that it should be absent throughout. But perhaps there’s a meaning difference that I’ve simply missed.

blokeman · August 3, 2023, 1:52pm

I’m now trying to understand the difference between \text{elpd}(\text{M}_a, \text{M}_a|~y), whose estimation is the main topic of the article, and \text{e-elpd}(\text{M}_a, \text{M}_a), which is discussed only briefly. Here are follow-up questions:

Does \text{elpd}(\text{M}_a, \text{M}_b|~y) being “conditional on y” mean that it is conditional on the respective posteriors of \text{M}_a and \text{M}_b as estimated from this sample? That is to say, is \text{elpd}(\text{M}_a, \text{M}_b|~y) always specific to two particular fits rather than just two particular models?

And if that’s the case, then isn’t \widehat{\text{elpd}}_{\text{LOO}}(\text{M}_a,\text{M}_b|~y) a somewhat unsatisfactory estimator of \text{elpd}(\text{M}_a, \text{M}_b|~y), given that it simulates resampling, refitting and retesting and is therefore not conditional on a particular fit? Isn’t it in fact true that \widehat{\text{elpd}}_{\text{LOO}}(\text{M}_a,\text{M}_b|~y) makes more sense as an estimator of \text{e-elpd}(\text{M}_a, \text{M}_a) than of \text{elpd}(\text{M}_a, \text{M}_b|~y), given that its calculation involves lots of refitting?

avehtari · August 4, 2023, 2:13pm

As the word sample is overloaded, can you clarify do you mean, e.g. data sample or posterior sample?

Specific to two particular models and one particular data. It’s not clear what you mean by “fit”, but in the paper we assume that the computation is exact or close enough that we are not considering the extra variation from potential use of stochastic inference.

It’s not clear what you mean with these terms, but in LOO the training data sets in each fold are as close as possible to y and each other, and thus it simulates conditioning on y. There are also approaches where each fold is independent of each other which would simulate the case of conditioning on random data, but then each fold is also very different from y (at least has to be much smaller).

blokeman · August 4, 2023, 2:58pm

Sorry about being unclear. I’ve now looked at relevant sections of the article some more, and here’s what I gather (trying to be clear this time):

\text{elpd}(\text{M}_a, \text{M}_b~|~y) compares the future utility of \text{M}_a and \text{M}_b's fits to the sample at hand. Meaning, after being estimated from this particular dataset, how do the two models compare on their usefulness in predicting future observations from the same DGM.
\text{e-elpd}(\text{M}_a, \text{M}_b) compares the present and future utility of \text{M}_a and \text{M}_b in general, with limited interest in the usefulness of their posteriors as estimated from this particular dataset (which is only one of innumerable potential realizations of y).
\text{elpd}_\text{LOO}(\text{M}_a, \text{M}_b~|~y) is an unbiased estimator of both quantities, but it seems more natural as an estimator of \text{e-elpd}(\text{M}_a, \text{M}_b) because it involves repeated re-evaluations of the posterior “as if we were fitting the model to a new sample”.

avehtari · August 8, 2023, 5:04pm

With 1. and 2. I agree, but I don’t agree with 3. because y_{-i} are close to y and each other. If you would like repeated re-evaluations of the posterior “as if we were fitting the model to a new sample”, you would like to condition each repetition with data that are as independent of each other as possible (minimum overlap). You could this, for example, by dividing the data in K folds and unlike in the usual K-fold-CV, you would use only the kth fold for fitting and then you would have K independent “training” data sets. Of course then the test sets have high overlap leading to other complications.

Topic		Replies	Views
Interpreting elpd_diff - loo package Modeling loo , interpret-results	47	14731	November 9, 2020
ELPD clarification General brms	3	456	January 12, 2024
Communicating the meaning of ELPD on the original scale of measurement? General loo	3	1156	September 25, 2017
Loo comparison in reference to standard error General loo	10	3084	May 1, 2018
Quick examples of loo() interpretation Modeling loo	11	1849	July 3, 2020

Sivula, Magnusson, Matamoros & Vehtari (uncertainty in elpd_loo comparison): newbie questions

Related topics