Resolving fully bayesian uncertainty quantification and LOO cross validation

Suppose I’ve measured a set of N students, each a vector of student-property covariates X_{i} and a performance score Y_i. I fit the model y ~ X \beta for some vector of coefficients with full bayes to extract a posterior distribution over \beta.

Now for new students drawn from the same population, I want to quantify uncertainty in my predictions for their performance. I compute \hat{y} = X_{i + 1} \beta_{j} for \beta_{j} draws from the posterior for \beta, then draw from some residual distribution about \tilde{y} \sim D(\hat{y}) to produce a posterior predictive distribution for the new performance score.

I also run LOO cross validation against this model, and I find a major drop in the ELPD averaged over the hold-out fits compared to when I fit and evaluate against the entire dataset.

My question is, does that drop in the cross-validated loss function tell me something about my posterior predictive distributions? That I shouldn’t trust them or that they’re too narrow and new data will likely fall outside of them? Or is cross-validation purely a model-selection technique in predictive settings, with no role to play in inferential-uncertainty quantification?

My intuition is that in the case where the likelihood reflects the true data generating process, my posterior predictive distributions are tautologically correct up to my choice of prior, while in the case where the true data generating process is unknown cross validation is telling me my model will likely fail for new data. But I can’t formalize that.

In addition, in the case of homoskedastic normal residuals, which produces the better credible intervals: cross validated MSE + E[\beta] X_{N + 1} or the central interval from the posterior predictive density?

1 Like

sorry for not getting to you earlier. I am not an expert on cross validation, but since nobody else answered, I will give it a try.

I think your intuition is largely correct, but I also can’t formalize that very precisely. What I would say is that cross-validation let’s you estimate error on unobserved data. So your model might not “fail”, just have larger predictive error than what is the in-sample predictive error, i.e. it may just “degrade” and not “fail completely” (obviously depends on the actual numerical values)

Did you check out the Cross Validation FAQ by Aki? This might have some additional answers.

I think the two intervals represent two different prediction tasks - in one of the tasks you assume your data is representative and future data will be “same” (in some sense) as the data you’ve observed. When using the cross-validated MSE I think you relax that assumption a bit, but you only get an approximate answer. Not sure how to make this a bit more rigorous.

Best of luck with your model!

1 Like