How does ADVI select mini-batches for stochastic gradient ascent?

laifuthegreat · April 1, 2021, 3:26pm

and
http://www.stat.columbia.edu/~gelman/research/unpublished/advi_journal

indicate that ADVI selects data subsamples to optimize the ELBO using minibatch stochastic gradient ascent.

How exactly does stan do that? It’s not clear to me how Stan can know how to select a minibatch from data provided in an arbitrary stan program’s data block, since data can be operational variables or through multiple vectors where indexing doesn’t match exactly.
Furthermore, in models with hierarchical structure, data are only exchangeable/conditionally i.i.d. with dependencies between points in the same group. does that not pose a problem for minibatch selection? I haven’t really though this through so I don’t really have a reason it wouldn’t work though.

Stephen_Martin · April 2, 2021, 1:44am

I assumed that the “stochastic” part is due to the monte carlo expectation of the gradient (which is noisy/stochastic).
No?

betanalpha · April 2, 2021, 6:14pm

This is correct – the variational optimization is “stochastic” only in the sense that the expectation needed is estimated stochastically with Monte Carlo.

avehtari · April 7, 2021, 7:24pm

Additional clarification that ADVI algorithm could be used also with minibatching, but Stan’s implementation uses all data (and there is still stochasticity due to the Monte Carlo estimate of the variational target and gradient).

Topic		Replies	Views
Streaming data with ADVI (minibatch) General	4	1191	August 22, 2024
Stochastic subsampling in ADVI Algorithms variational-bayes	1	669	June 18, 2019
Why does ADVI use stochastic gradient ascent not LBFGS Algorithms	10	1640	July 22, 2018
Stan Variational Inference Deprecation and Documentation Developers fitting-issues	2	161	April 25, 2025
Optimizing Functions of stan random variables General	7	1126	March 31, 2021

How does ADVI select mini-batches for stochastic gradient ascent?

Related topics