In this case, the posterior is normal.
The closer the posterior is to a multivariate normal with zero covariance (i.e., diagonal covariance matrix), the better the ADVI mean-field approximation fits reality. Large N
tend to work better in practice in what we’ve seen, too.
What we don’t know is where the boundary is between “normal enough” and “not normal enough”.
Talk about the natural barrier issue, why cannot we use Laplace approximation instead?
It gives you the MAP so performance wise on the training data is most likely to be awesome.
I don’t see any negative viewpoint on Laplace approximation. Do you know any of them?
Not sure what a natural barrier issue is. Dan Simpson’s behind a bunch of the INLA models, so I doubt you have to convince him of the utility of Laplace approximations.
A lot of the models we care about don’t have MAP estimates because the log posterior is unbounded (e.g., hierarchical models). But we can use variational techniques because the posterior means exist. In some cases, we can also tweak priors to make MAP better behaved if we’re willing to change the model.