and
http://www.stat.columbia.edu/~gelman/research/unpublished/advi_journal
indicate that ADVI selects data subsamples to optimize the ELBO using minibatch stochastic gradient ascent.
- How exactly does stan do that? It’s not clear to me how Stan can know how to select a minibatch from data provided in an arbitrary stan program’s data block, since data can be operational variables or through multiple vectors where indexing doesn’t match exactly.
- Furthermore, in models with hierarchical structure, data are only exchangeable/conditionally i.i.d. with dependencies between points in the same group. does that not pose a problem for minibatch selection? I haven’t really though this through so I don’t really have a reason it wouldn’t work though.