I’m agreeing with you here :-)
What drove me crazy in ML was not only the fact that they were convolved, but that a single operating point (let’s say 17 iterations when 1–100 were tried) was cherry picked (under X-validation) and then used as the bold-face my-system-wins point on NIPS-like papers.
You wouldn’t. There’s no way to test the amount of uncertainty in a posterior for a real problem—it’s all “subjective” in the “subjective Bayes” sense (and by that, I mean it’s about our knowledge at any given time, not that it’s somehow touchy-feely belief stuff). You can test on held-out data, and that’s what I’d recommend. It does seem to be the gold-standard in all of this system evaluation stuff. It’s just that I’d want to measure calibration of uncertainty, not the point estimates.
I’m just saying it could happen that inferences with an MLE would be fine, but inferences with fat tails in Bayes would blow up. I don’t have an example in mind and it’s not something that keeps me up at night because the models I work with tend to be much better behaved than this.
I think what would be really helpful would be examples where Bayes gives clearly better inferences than taking point estimates. One example is in small count binomial models—if you take point estimates, they’re going to underestimate uncertainty which can be seen pretty clearly with real data calibrations (I go over an example in my repeated binary trials case study). Are there other clear examples like this we shoudl be pointing people to?
The problem we’ll have is that we have to take on the ML people on their own turf—we need to make better point predictions. If we don’t do that, they won’t care.