Hi everyone,

first of all many thanks for the great Stan ecosystem which we are using extensively in our research.

Here, I have a question about the loo package. I was wondering about how the standard error (SE) of the elpd_loo estimate is computed. Following eq. (22) of the paper https://arxiv.org/pdf/1507.04544.pdf the error of the estimate should be about \sqrt(\sum_i Var(\widehat{elpd}_{loo,i}). What I have trouble with is linking this with eq. (23) which makes two strong assumptions

- Estimation errors for individual observation likelihoods, i.e. for each

i are equal (and independent) - The estimation variance can be obtained from the spread of estimates

across observations, i.e. as V_{i=1}^n \widehat{elpd}_{loo,i} in the notation of the paper.

Especially this second point does not make sense to me. Just consider two data points, i.e. i = 1, 2. Now, the first point is predicted much better than the second \log p(y_1 | y_2) \gg \log p(y_2 | y_1). Further you can estimate both predictive likelihoods with high precision. Then, the precision of the total predictive likelihood – which is just the sum over all observations – should be high as well. Yet, the large spread across data points gives rise to a high SE according to the formula in the paper and the loo package.

Any thoughts on this would be highly appreciated. With best regards,

Nils Bertschinger