Intermediate between mean-field and full-rank ADVI

wjn0 · April 17, 2019, 8:33am

Hello,

I’m new to using the VB algorithm in Stan. I’m wondering if it’s sensible (and possible) to use an intermediate between the mean-field approximation and full-rank VB. I’m working with a latent variable model with a large number of parameters (10k+) and I’m getting promising results with the mean-field approximation. I’d like to use something closer to the full-rank approximation; however, as expected, it isn’t feasible with this many parameters due to memory constraints. I have prior knowledge about which subsets of parameters I expect to exhibit correlated posteriors. Is it possible to express this somehow in my Stan model? In case it matters, I usually use PyStan.

Walter

avehtari · April 17, 2019, 1:20pm

Low-rank covariance matrix approximation would be sensible and there are a few papers demonstrating useful results. Currently this is not implemented in Stan. It would not be a huge amount of work to add it by modifying the current fullrank algorithm, but that much work anyway that I don’t know who would have time to do it in near future. This could be potentially a good project for someone new interested in vb.

wjn0 · April 17, 2019, 2:20pm

Thanks for the answer. Where would one start if they were interested in contributing a low-rank algorithm? I just read over some of this code here, so based on this I guess it basically amounts to mostly implementing this interface, and properly managing the various lower Cholesky factors in block-diagonal form? I guess maybe deciding on the API (for describing the desired blocks of variables) would be tricky too. My C++ skills are rusty but I’m considering working on it in my spare time. Where does hypothetical feature discussion normally take place - is opening a GitHub issue appropriate?

avehtari · April 17, 2019, 3:56pm

I was thinking a simpler form of diagonal + reduced rank covariance. See, e.g. Improving Variational Approximations | Andrew Miller and [1701.03208] Gaussian variational approximation with a factor covariance structure

It’s better to continue discussion here until there is a clear plan what to propose in an issue.

wjn0 · April 18, 2019, 12:18pm

Got it, thanks for the refs and advice. Ong et al. seems like a reasonable approach. Equations (9) and (10) are easily implemented within the current framework (using the pre-computed gradient wrt mu by stan::model::gradient).

My proposal would be this (directly mirror Ong et al.):

Implement a new variational family normal_lowrank, with a mean parameter of dimension n, a low rank factor matrix of dimension n x r, and a std vector of dimension n.
Implement the gradients with respect to mu, B, and d. The gradient of mu is just the model gradient (as in both the fullrank and meanfield implementations). B and d are given in the paper, and rely on the precomputed mu gradient. There is a computationally efficient form for the inverse in the second terms of the gradients wrt B and d (Woodbury formula turns it into the inverse of an r x r matrix).
Implement the entropy computation (for computing the ELBO). The matrix determinant lemma turns it into an r x r determinant.
Implement the transformation. The transformation relies on an r + n dimensional sample (z, eps) from N(0, I) and has closed form given in para 1 of section 3 (mu + B.z + d*eps).

This covers most of the math, I think. It seems that all of this can be done by mirroring the current fullrank and meanfield implementations in `src/stan/variational/families’, plus some new boilerplate (e.g. confirming the rank is strictly less than the dimensionality).

The part I’m not sure about is outlining a plan for how to link the algorithm to the interfaces/compiler. Hopefully this is less intensive than implementing the math. Any guidance here?

Thanks again.
W

avehtari · April 18, 2019, 2:21pm

Looks good.

Yes.

Which interface you are using? It might be easiest to work with CmdStan when adding this kind of new feature. See cmdstan/src/cmdstan/command.hpp how variational algorithm is called and cmdstan/src/cmdstan/arguments.hpp for argument definitions. Modifying these you can add new argument for rank. You need to add lowrank also stan/services/experimental/advi, and see tests in test/unit/variational and test/unit/services/experimental/advi. The output is same for meanfield and fullrank so there is no changes needed when adding lowrank.

wjn0 · April 20, 2019, 4:04pm

Cool, I’ve implemented a first pass of the math and some of the boilerplate. CmdStan seems like the most straightforward approach for a first pass at the interface. I’ll open an issue now and start the pull request. I’ll tag you in the issue and PR. Thanks for all the guidance here, too.

avehtari · April 21, 2019, 1:29am

Wanted to add to the guidance

https://github.com/stan-dev/stan/wiki/Testing-Stan-using-Gnu-Make-and-Python
and stan makefile has

Tests:

  Unit tests are built through make by specifying the executable as the target
  to make. For a test in src/test/*_test.cpp, the executable is test/*.

  Header tests
  - test-headers  : tests all source headers to ensure they are compilable and
                     include enough header files.

  To run a single header test, add "-test" to the end of the file name.
  Example: make src/stan/math/constants.hpp-test

  Cpplint
  - cpplint       : runs cpplint.py on source files. requires python 2.7.
                    cpplint is called using the CPPLINT variable:
                      CPPLINT = lib/stan_math/lib/cpplint_4.45
                    To set the version of python 2, set the PYTHON2 variable:
                      PYTHON2 = python

Knowin these has saved a lot of my time.

Bob_Carpenter · April 29, 2019, 8:31pm

This would be great.

Whatever you do, talk to @bbbales2, who is working on similar things for our metric (mass matrix) adaptation for HMC.

There’s also a pile of work by Tamara Broderick et al. on adjusting mean-field posteriors to account for covariance.

Both that and structured (block diagonal) would be helpful. The blog diagonal thing’s a little harder to imagine how it would be specified.

The place to add it as a service function in the Stan repo. Things get added there, then wrapped into CmdStan, RStan, and PyStan. We can help with the latter if the service function’s implemented.

wjn0 · April 29, 2019, 9:12pm

I’ve got a draft of the service implementation and CmdStan addition and all seems to be working on that side. But partial template overrides not being possible for classes mean there’s a lot of code duplication. I also need to debug a lot of the Eigen code in the math which I’m not familiar with so it will probably be a week or two before I have a working version. And another couple weeks to sort out the code duplication issue.

I agree a block-diagonal form would be useful. It might also make way for a hybrid ADVI/point estimates procedure. I’ve implemented such models in TensorFlow and they are slow to compute the gradients so I think they might make a good addition to Stan.

Bob_Carpenter · April 29, 2019, 9:15pm

C++ lets you partially specialize template classes.

Is that not what you mean?

wjn0 · April 29, 2019, 9:18pm

Ah, yes, that is what I mean. I did that but because of the additional rank arguments the internal methods require some modification to avoid duplication and I haven’t worked out how to best do that yet. Still warming up to C++.

hyunji.moon · November 15, 2020, 9:29am

Could you please be more specific on low rank covariance and adjusting mean-field posteriors to account for covariance? I was wondering in what ways the latter could be applied to the former.

Jana_K · March 25, 2021, 2:07pm

Hi! Is your Stan code for the factor covariance by Ong et al. available? I am currently using this covariance structure for my VB in my master thesis but so far only implemented in R which takes much time for large parameter spaces. Kind regards, Jana

bbbales2 · March 25, 2021, 2:26pm

@Jana_K @wjn0 recently made a pull request over here: Low-rank ADVI by wjn0 · Pull Request #3022 · stan-dev/stan · GitHub so maybe @wjn0 can give you directions on how to compile and use that from source.

Jana_K · March 26, 2021, 7:53am

Thank you for the quick response!!

wjn0 · March 26, 2021, 11:33pm

@Jana_K indeed, that PR should be ready to use, although further validation is pending.

If compiling from source and using the cmdstan interface is an option for you, let me know and I’ll happily provide some instructions! It basically amounts to using that branch of stan from my repo and the corresponding branch of cmdstan (also from my fork of that repo) and re-compiling.

Topic		Replies	Views
Question of fullrank RStan fitting-issues , specification	1	839	July 19, 2020
Variational inference with a mixture of dense and diagonal normals General specification , variational-bayes	3	802	November 20, 2021
Constructing a low-rank approximation to a covariance matrix General	2	2056	September 11, 2017
Correlated 2D Gaussian breaks ADVI Modeling fitting-issues	23	3368	July 12, 2018
Covadapt: Low Rank covariance adaptation for high dimensional problems Algorithms mcmc	20	2080	September 14, 2019

Intermediate between mean-field and full-rank ADVI

Related topics