Variational Bayes runtime and memory usage

I am using vb() to fit a large model. The model fits just fine, however rstan takes a very long time between when the console states that it has drawn a sample from the approximate posterior and that it has “COMPLETED” until it actually finishes. During that time, which has been as long as an hour, there is intense CPU activity and very large spikes in memory usage (I noticed because my computer seized up for a bit when the memory pressure became severe). What is stan doing at this point? Why does it suddenly require far more memory than is required at any previous point in the vb() function? I am drawing only a tiny posterior sample of a small number of parameters, that is not what is using more than the 32GB of memory that I have in my computer.

I’m not sure if this is related, but I’ve noticed the same thing when fitting a large model using NUTS in pystan. Is it perhaps due to Stan computing some summary statistics on the posterior samples?

It reads off the disk, but I don’t think the I / O should be that noticeable.

I have encountered this before also.

After a bit of experimentation I concluded (perhaps incorrectly) that this occurs when you have large numbers of parameters.

The ADVI algorithm has computational complexity O(N*p) (I think), where N is the number of daya points and p is the number of parameters, whilst the generation of samples has complexity O(p^3) due to the Cholesky decomposition required.

If you have a large number of parameters the runtime for the second operation (what you describe) can be substantial.

Julian

That makes sense. Isn’t it drawing from a multivariate normal approximate posterior? That should be cheap if it’s a diagonal matrix (mean field), but will involve more expensive matrix multiplies if it’s dense.

That makes sense, but if that is true, the message to the console is misleading. The lengthy wait comes after the console prints “COMPLETED”. So either that message is being printed at the wrong time, or it is not the draws from the approximate posterior that are taking a long time.

Yeah, completed in this context refers to the optimization not the draws.

Is it worth checking that the draws aren’t doing a Cholesky decomposition or something else crazy for the mean field approximation?

I don’t know the code but maybe there’s a bug here.

I checked the code. In meanfield there is no accidental Cholesky. All the draws are generated and written to output between the messages “Drawing a sample of size … from the approximate posterior” and “COMPLETED”, and this was reported to be fast. I couldn’t yet figure out what happens after “COMPLETED” has been printed.

In advi.hpp logger.info("COMPLETED."); is after the draws.

This sounds like a bug but ti’s not going to get anywhere without a reproducible example. Do you have on eyou can share?

@lauderdale Do you have generated quantities block? Can you show your code?

Or if you don’t want to show the code to the whole internet, have Doug ping me.

There is no generated quantities block. I will try to get a reproducible example up here soon, but at the moment the model is having convergence problems with vb().