Got it, thanks for the refs and advice. Ong et al. seems like a reasonable approach. Equations (9) and (10) are easily implemented within the current framework (using the pre-computed gradient wrt mu by stan::model::gradient).
My proposal would be this (directly mirror Ong et al.):
- Implement a new variational family normal_lowrank, with a mean parameter of dimension n, a low rank factor matrix of dimension n x r, and a std vector of dimension n.
- Implement the gradients with respect to mu, B, and d. The gradient of mu is just the model gradient (as in both the fullrank and meanfield implementations). B and d are given in the paper, and rely on the precomputed mu gradient. There is a computationally efficient form for the inverse in the second terms of the gradients wrt B and d (Woodbury formula turns it into the inverse of an r x r matrix).
- Implement the entropy computation (for computing the ELBO). The matrix determinant lemma turns it into an r x r determinant.
- Implement the transformation. The transformation relies on an r + n dimensional sample (z, eps) from N(0, I) and has closed form given in para 1 of section 3 (mu + B.z + d*eps).
This covers most of the math, I think. It seems that all of this can be done by mirroring the current fullrank and meanfield implementations in `src/stan/variational/families’, plus some new boilerplate (e.g. confirming the rank is strictly less than the dimensionality).
The part I’m not sure about is outlining a plan for how to link the algorithm to the interfaces/compiler. Hopefully this is less intensive than implementing the math. Any guidance here?