Hi All,
I’ve been working on a way to get orthonormal matrix parameters in Stan for models like probabilistic PCA and @bgoodri and @Bob_Carpenter told me to post it here to see if there’s any interest and spark discussion. Here is the paper and the Stan code.
Overview
As some you may know there are custom HMC methods by Brubaker as well as Byrne and Girolami that allows you to have models with orthonormal matrix parameters. I was using a custom implementation of Byrne’s method, but I wanted a way to do it in Stan so I wouldn’t have to write my own likelihoods and I can use Stan’s NUTS implementation. Not to mention we wanted to also use Stan’s VI.
To accomplish this I came up with a transform that describes any orthonormal matrix in terms of a set of rotations, and hence parameterizes them in an unconstrained space. I then figured out how to do the associated change of volume measure for this re-parameterization and we coded it up in Stan.
The first test was comparing Stan draws from a uniform posterior over the Stiefel Manifold to uniform draws via the normal QR trick. After that we build PPCA and some more complicated models including hierarchical PPCA.
Issue with reflections and charts
I want to point out there is a reflection non-identifiability when you’re inferring a basis for PCA. We handled this by limiting the angle of rotation to 180 degrees instead of the full 360 for some the angles. This also is part of the reason it’s not such a big deal in practice that the circle-like topology is being transferred to an interval. This is described in the paper. I also want to point out that if you posterior is on the edge of one of rotation angle boundary you will have a multi-modal posterior, but you can just re-orient the chart in that case to go from [-pi/2,pi/2) instead of [0,pi), but yes this has to be done manually as of now.
Scaling with matrix size
To compute the change of measure for an n x p orthonormal matrix you have to calculate the determinant of a matrix that has np-p(p+1) rows and columns, so be wary of this. We started using VI instead of HMC on our bigger problems.