Relevance of the standard error in the "loo" package

This may be considered as an extension of a previous thread (link below) but I am starting it as a new one.

Aki pointed out that the standard error is that which occurs when one takes a new sample from the population. My initial reaction to that was “How frequentist!” Long ago I heard that the difference between the a frequentist views the data as random and the (population) parameters were fixed. A Bayesian is just the reverse. The data are fixed, and the parameters are random.

I don’t think there is a “population” for my particular problem. I am using data from the previous 10 years to predict quantities for the next 9 years. (No scolding here - we actuaries think this is important.) We may be trying this with different datasets, but if there is something one can call a “population” it is different for each dataset. For cases like this, why is the loo standard error relevant?

Now for the practical question. If I really believe that the standard error put out by the loo package is irrelevant, should I pay attention to the high “k” warnings that the loo package spits out?

Here is the link to the previous thread.

That thread was so long that it was a good idea start a new one. Your questions are relevant.

Sorry being sloppy so that it could be interpreted like that.

See the Bayesian interpretation in A survey of Bayesian predictive methods for model assessment, selection and comparison and be careful not to mix evaluation of frequency properties with frequentist approach which accepts only the frequency interpetation of probabilities.

See the discussion in A survey of Bayesian predictive methods for model assessment, selection and comparison especially in section 4.3

I think that in the case you now described loo is the wrong choice and then question about the loo uncertainty is irrelevant.

As long as you care about prediction tasks where loo is relevant then you need to care about k warnings. You may have cases where you would use loo to approximate the predictive performance in your prediction task, but would compute the related uncertainties differently. In M-open case the common problem is that even if you assume x fixed, y is not fixed, but you can’t separate them from the pair (x,y). If you trust your model, you may also go beyond M-open approach.

I realized I should have been more careful here. M-* doesn’t directly specify what we assume about the future data distribution p(\tilde{x},\tilde{y}) for unknown \tilde{x}, or p(x',\tilde{y}) for known future x'. So you could also do something else than what is discussed in, and in that sense your question is more about whether to use explicit model for the future data distribution or not. It is just common that in M-open the explicit model for the future data distribution is not used (since we don’t trust any of our models).

Sorry for the slow reply. I spent some time looking at your survey paper. My current interest is in the case where we know all the xs and some of the ys. I am interested in getting a predictive distribution of the unknown ys. Although we actuaries have some good intuition on what models are appropriate, I have yet to find any actuaries who do not subscribe to the Box dictum. So I think M-Open is appropriate for my case.

I want to make the standard Bayesian assumption that the data is fixed, and we want to find the predictive distribution of the ys. I want to compare alternative models using the looic/elpd statistic produced by the loo package. It appears to me that the main error in making the comparison is the sampling error that MCMC takes from the posterior distribution, and we can ignore the warnings about the k parameter as that pertains to a future sample from a population. Please let me know if you disagree with any of this!

I have been playing around with calculating the sample error of the raw importance ratios and using a standard Taylor series approximation I can get a clean expression for the variance of log(p_i|y_{-i}). The problem is that I need to get the variance of the sum of those log(…)s. As the underlying variable is the sample, we should expect those log(…) to be correlated. It is easy to get an upper bound by assuming perfect correlation. But right now it looks like one would have to resort to bootstrapping on the sample to get the sample error for the elpd statistic. If there is a closed form approximation for this, could you please point me to it.

MCMC error is the smallest. If k values are high importance sampling error can be considerable. If you use PSIS-LOO in loo package for any purposes you shouldn’t ignore k diagnostic.

Your x is fixed, and you model p(y|x), but since you don’t trust your model you are avoiding using an explicit model for the future data, and you can’t avoid that in cross-validation you can’t separate the uncertainty of y and x. For time series you could also consider, e.g. one step ahead predictions.

I don’t recommend to look at the variance of the raw ratios, as it can be infinite. It’s better to work on smoothed ratios as shown in [1507.02646] Pareto Smoothed Importance Sampling

how to compute this so that both MCMC and IS error are taken into account in shown in [1507.02646] Pareto Smoothed Importance Sampling and loo 2.0 will compute these, too.

A strong statement – thank you.