How does ADVI select mini-batches for stochastic gradient ascent?


indicate that ADVI selects data subsamples to optimize the ELBO using minibatch stochastic gradient ascent.

  1. How exactly does stan do that? It’s not clear to me how Stan can know how to select a minibatch from data provided in an arbitrary stan program’s data block, since data can be operational variables or through multiple vectors where indexing doesn’t match exactly.
  2. Furthermore, in models with hierarchical structure, data are only exchangeable/conditionally i.i.d. with dependencies between points in the same group. does that not pose a problem for minibatch selection? I haven’t really though this through so I don’t really have a reason it wouldn’t work though.

I assumed that the “stochastic” part is due to the monte carlo expectation of the gradient (which is noisy/stochastic).

This is correct – the variational optimization is “stochastic” only in the sense that the expectation needed is estimated stochastically with Monte Carlo.

Additional clarification that ADVI algorithm could be used also with minibatching, but Stan’s implementation uses all data (and there is still stochasticity due to the Monte Carlo estimate of the variational target and gradient).