Thatâ€™s what weâ€™ve always meant by â€śvariational Bayesâ€ť (as opposed to other variational methods, which use KL differently).

I wasnâ€™t even thinking about multimodality. Once that enters the picture, we can no longer even perform reasonable approximate Bayesian inference. The VI solutions are terrible, concentrating around a single mode precisely because of the orientation of the KL divergence used. EP, another variational method, would go the other wayâ€”thereâ€™s a great picture of this in Bishopâ€™s book for a correlated bivariate normal, but the ideaâ€™s the same. Basically, even approximate Bayes is intractable with combinatorial multimodality. Whatever people do in practice (marginalizing continuous parameters for discrete Gibbs, marginalizing continuous parameters for HMC, variational inference, etc.) may be useful for data exploration or feature extraction using LDA or it might be useful for prediction in some settings, but itâ€™s not going to be anything like the answer youâ€™d get from full Bayesian inference.

I was thinking more about unimodal, but asymmetric posteriors. Like say, a beta(10, 15)â€”there the VI solutionâ€™s going to be closer to the posterior mean. Thereâ€™s another good diagram in Bishopâ€™s book for this.