I have a naive question on algorithms & centering: I’ve understood why non-centering is useful for HMC, but I am curious if the same rationale necessarily holds for mean field ADVI, which as far as I understand will simply ignore correlations among parameters. It seems obvious that the ELBO would be lower compared to an equivalent non-centered model, but does that imply that the maximization of ELBO suffers?
The same rationale helps. With non-centering the posterior of the parameters is closer to independent normal.
and if the posterior is close to independent normal it works just fine.
See Figure 5 and discussion in “Yes, but Did It Work?: Evaluating Variational Inference” https://arxiv.org/abs/1802.02538
Thanks for the reference & comments. I will search arxiv next time before posting here 😅(an idea for a Discourse plug in perhaps)
Figure 5, lower left figure, it seems the combination of centered & non-centered cover more of the parameter space covered by NUTS, than either alone. If I understand correctly, this is irrelevant, because the non-centered variant is closer to the true posterior?
That figure is bit overcrowded, but yes what matters which one is closer and with PSIS we can further correct when estimating various expectations.