I was reading the ADVI in Stan paper and had a question about the Gaussian mixture example (section 3.3).

The paper says that it is possible to have stochastic subsampling of the data when doing ADVI, with example code to do that in figure 11 (page 20). However, I can’t see anywhere in that example where the data is randomly sampled - it just looks as though a subset of the data is defined in the data block with no reference to the full dataset.

Is this actually the case, or am I missing something? Would it be possible to define the full dataset in the data block and subsample it using an rng function? Would that be the correct thing to do?

The paper custom hacked the Stan internals in a way we haven’t released in order to implement a stochastic version. It’s not in our released algorithm, nor are we convinced it will be stable enough to release for the kinds of problems we deal with. Given the title of their paper (“ADVI in Stan”) and that it talks about stochastic gradient, it’s very misleading as we don’t have stochastic gradient in Stan’s ADVI implementation. We do use nested MCMC to compute an expectation and that’s also stochastic in nature.

Yes, you could subsample in the transformed data block.

No. They actually really subsampled per iteration in a custom implementation they wrote just for the paper.