Variational Bayes results seems sensible, but vary - What to change?

Bjoern · September 22, 2020, 7:22am

I am running a model that works nicely using NUTS (but is a bit slow) using variational Bayes (mostly because its a pretty simple model and I hope it would be a lot faster using VB, which for the particular use case would be really useful). I am getting sensible answers in the posterior samples, but I noticed that with different random number seeds I get more variation in e.g. median estimates that I would have expected by Monte-Carlo error alone (e.g. NUTS gives me more stable results). Thus, I suspect the algorithm did not truly converge or did not converge to the same (local?) optimum each time.

So, some of my ideas were:

I could change some parameters of VB (like one often ends up putting adapt_delta to a higher value than the default and it just improves things). However, I am not sure what parameters I would logically try changing/making more stringent first? elbo samples? Tolerance? Something else?
Just run the VB several times (given that full NUTS takes several min and VB a few seconds that could be an option) and average outputs (or pick some based on some criterion of better fit?)?

Does someone have some experience with this and some recommendations on what they would try?

avehtari · September 22, 2020, 5:38pm

For background on the problems with the current implementation and potential improvement see https://arxiv.org/abs/2009.00666. These improvements are not yet implemented in Stan, but you can see which parameters to vary and how that could affect the results. Also the paper discusses why averaging of variational parameters is sensible.

songpeng · November 6, 2020, 3:51pm

Hi @avehtari
The samples using drew from the variational inference is coming from after the optimization or during the optimization ?

Is that possible to get the learned parameters of the approximated distribution?

avehtari · November 6, 2020, 4:00pm

During the stochastic optimization draws from the approximation are used to compute gradient and elbo, but not stored. After the stochastic optimization the draws from the approximation are stored, but the parameters of the approximation are not currently stored.

songpeng · November 6, 2020, 4:05pm

Thank you so much!

One more question. I know under hierarchical modeling, we’d better use reparameterization for HMC. But for VI, since we directly sample from the approximated distribution, my understanding is that the reparameterization is not needed or helps a little for VI. Am I right?

avehtari · November 6, 2020, 4:06pm

Reparameterization is even more important for VI, as the approximate distribution is (often) normal distribution and then we want the true posterior to be close to normal.

songpeng · November 6, 2020, 8:21pm

Thank you very much!

Topic		Replies	Views
Different runs give me different estimated values Algorithms mcmc , variational-bayes	4	1120	July 3, 2017
Parameter sweep methods Algorithms	4	843	June 28, 2018
New algorithm: Gradient-based Adaptive Markov Chain Monte Carlo Algorithms mcmc	13	2081	February 16, 2022
FYI: BayesFlow (part of TensorFlow) Developers	2	2547	April 17, 2018
Correcting bias of VB over a proportion (0-1) parameter (compared with HMC) Modeling	3	954	September 2, 2019

Variational Bayes results seems sensible, but vary - What to change?

Related topics