Stan’s ADVI implementation culminates in drawing some number of samples from its approximation to the posterior over the latent variables.

My understanding of variational inference is that it approximates the posterior over the latent variables as a simpler distribution, e.g. a multivariate Gaussian with diagonal covariance matrix.

My question is: why draw samples from a simple approximation to the posterior distribution if you already know the parameters of the distribution?

My attempt at an answer: we need to draw samples to understand the (approximation to the posterior) distribution over the latent variables, since the “nice Gaussian” parameters of the approximation are only relevant to the **transformed** latent variables, and we need to understand the (implicitly non-Gaussian) distribution of the latent variables themselves.

If that logic is correct, then it must be that the transformations are sufficiently complicated in general that there aren’t ways to directly compute parameters of some nice distribution in constrained space, from corresponding parameters of the multivariate diagonal Gaussian in unconstrained space.