Variational Bayes versus MAP for prediction

Not that I know of, but Andrew’s recruiting willing participants to try to evaluate just this question. We’ll have max marginal likelihood plus (importance adjusted?) Laplace approximations as one contender.

The main problem we’ve had with ADVI is convergence or just getting the wrong answer (not wrong in that the algorithm’s buggy but wrong in that the ADVI mean isn’t very close to the actual posterior mean as measured in true posterior standard deviations). Andrew et al. are finding that it helps enormously to have everything on the unit scale. They’re also finding that when the hierarchical parameters are wrong, the posterior predictive distribution can still be quite reasonable.

The other issue is uncertainty quantification. With MLE/MML and Laplace, you just use the inverse Hessian as estimated posterior covariance. In mean-field ADVI, the posterior covariance is assumed to be diagonal; we’ve had a hard time estimating the dense form.