For Probabilistic Prediction, Full Bayes is Better than Point Estimators

Bob_Carpenter · April 25, 2019, 3:26pm

I just wrote a case study that isn’t particularly Stan-related, but uses Stan.

Bob Carpenter. 2019. DRAFT: For Probabilistic Prediction, Full Bayes is Better than Point Estimators.
bayes-versus.pdf (364.3 KB)
Source code [GitHub]

Comments most welcome (especially if you know how to fix knitr’s table rendering in pdf).

Here’s the abstract:

A probabilistic prediction takes the form of a distribution over possible outcomes. With proper scoring rules such as log loss or square error, it is possible to evaluate such a probabilistic prediction against a true outcome. This short note provides simulation-based evaluation of full Bayesian inference, where we average over our estimation uncertianty, and two forms of point estimation, one that uses the posterior mode (max a posteriori) and one that uses the posterior mean (as is typical with variational inference). The example we consider is a simple Bayesian logistic regression with potentially correlated predictors and weakly informative priors. To make a long story short, full Bayes has lower expected log loss and squared error than either of the point estimators.

There’s also a bit on evaluating proper scoring rules.

I should’ve done this ages ago. I’ve done things like this in my repeated binary trial case study, but that was in the context of binomials and it was buried among a lot of other stuff. I commtted the pdf and html to the repo, so if you want the html, it’s there.

bgoodri · April 25, 2019, 4:13pm

This is great. I think you should also show that we can estimate the ELPD of held-out data just fine using loo when full Bayes is used. So, there is an additional gain to be had by conditioning on all the available data. @avehtari also added some loo stuff for VB to the impending RStan.

mercifr1 · April 26, 2019, 6:57am

Thanks for this new case study. They are extremely useful!

This reminds me of a paper by Marc Lavielle and Benjamin Ribba (https://rd.springer.com/article/10.1007%2Fs11095-016-2020-3, or see https://hal.archives-ouvertes.fr/hal-01365532/document).

In a non-Bayesian setting, they show that instead of maximizing each individual conditional distribution of model parameters, random sampling is to be preferred to obtain values better spread out over the marginal distribution of individual parameters.

Bob_Carpenter · April 29, 2019, 7:26pm

How do you sample over parameters in a non-Bayesian setting? (I think they use empirical Bayes, which means point estimating some parameters and then sampling over others.)

We don’t want marginal parameter distributions, we want joint ones for full Bayesian inference.

I was also focusing on probability estimation, not on Type I error rates.

Topic		Replies	Views
Doubts when modelling the probability of a team beating another team Modeling specification	3	585	November 23, 2020
Use Posterior_predict in rstanarm to generate probabilites for each observation in a logistic regression model Modeling	6	1341	February 22, 2019
The difference between Bayesian and Frequentist misspecified models General performance	6	824	September 9, 2019
How to compare the estimation results of least square method and Stan model Modeling	4	480	December 28, 2020
Prediction using point estimates in rstan Modeling	13	1658	October 22, 2017

For Probabilistic Prediction, Full Bayes is Better than Point Estimators

Related topics