That’s what we’ve always meant by “variational Bayes” (as opposed to other variational methods, which use KL differently).
I wasn’t even thinking about multimodality. Once that enters the picture, we can no longer even perform reasonable approximate Bayesian inference. The VI solutions are terrible, concentrating around a single mode precisely because of the orientation of the KL divergence used. EP, another variational method, would go the other way—there’s a great picture of this in Bishop’s book for a correlated bivariate normal, but the idea’s the same. Basically, even approximate Bayes is intractable with combinatorial multimodality. Whatever people do in practice (marginalizing continuous parameters for discrete Gibbs, marginalizing continuous parameters for HMC, variational inference, etc.) may be useful for data exploration or feature extraction using LDA or it might be useful for prediction in some settings, but it’s not going to be anything like the answer you’d get from full Bayesian inference.
I was thinking more about unimodal, but asymmetric posteriors. Like say, a beta(10, 15)—there the VI solution’s going to be closer to the posterior mean. There’s another good diagram in Bishop’s book for this.