Discuss overfitting / model validation in Bayesian context


I have posted a question on stats.stackexchange.com at the URL below and would appreciate some discussion on the topic there. Thanks.


Overfitting and model validation in frequentist inference is framed in terms of the frequentist properties of given decisions (which point of interval estimator to report, etc). Consequently these concepts don’t directly carry over to a Bayesian context because Bayesian inference doesn’t enforce any particular decision making processes. In order to formally transfer these ideas over to a Bayesian context you have to establish which decisions you want to make and then how you will use the posterior to inform those decisions, and then you can start asking questions about how well you do.

For more detail on this perspective see https://arxiv.org/abs/1803.08393.


Hi - so I want to write a formal proof of section 4.2.2 of this paper. Please help me expand this idea. I like the discussion, but I want a more rigorous mathematical definition. I was considering the mathematical definition of the null hypothesis: \theta = 0. This is the set of mathematical parameters, correct?

From elementary mathematical analysis, we know that for any if x \in R, if k \in R, then prob(x = k) = 0. (i.e., it’s an uncountably infinite set, so there are infinitely many points s.t. k could equal, so it’s not at all probable that k equals any one point).

So if i’m correct about the above point (\theta = 0, as you define it?, is the set of regression coefficients/parameters s.t. every \theta_i = 0). Then this is obviously improbable, correct? (i.e. if prob(\theta_2 = 0) = 0, then obviously it’s not possible that all \theta = 0).


I’m not sure what question you’re asking – yes point events have zero probability for measures absolutely continuous with respect to the Lebesgue measure. Which also means that all point events are equally improbable.

The null hypothesis is typically defined by setting some set of phenomenological parameters to zero. This may be only a subset of the parameters, leaving other nuisance parameters unconstrained. In this setting the set of parameters in the null hypothesis will have measure zero with respect to any reasonable measure on the full parameter space.

This is irrelevant in frequentist statistics (you can’t put measures on the parameter space) but does cause issues in Bayesian inference (a point null model will never be preferred unless you put infinite prior mass on them; this is one of the reasons why Bayes factors can get messed up).


I asked Andrew Gelman about this, and tried to blog his answer on his own blog. This was back when we were just getting started with the Cook-Gelman-Rubin diagnostics (renamed SBC for the arXiv paper).

There are some thoughtful comments.