Yes, you’re totally right about the slice that can’t be described with a single chart because of well… topology. A lot of the times I think you don’t want to necessarily represent the whole Stiefel Manifold though, lest you have the reflection issues, which in my experiments show up in Byrne and Girolami’s method as well.
As an example if you’re doing, PPCA on the 2D data plotted below and you want to find the first principal component, both the red vector and blue vector are equally valid models, and I would say one of them is a superfluous reflection. This will give rise to a multi-modal posterior with little mass in-between in the modes that not even Byrne and Girolami’s method can solve.
In our parameterization it was straight-forward to just limit the angle of rotation to just go from [-pi/2,pi/2). I’m not sure how one would achieve a similar effect using Byrne and Girolami.
One could imagine though, that at this point you might have data that points straight up in which case, there’ll be posterior mass on both the vector pointing up and down, and you’ll have the multi-modal thing going on. In that case I suggest rotating the chart, like you’ve mentioned to [0,pi). That was a good point though I didn’t realize that changing the chart might actually be more subtle though because really what we care about is where the mass in the joint posterior is not the marginal mass.
Models over the space of orthogonal matrices are becoming more and more relevant, but the only way that they can principly be implemented in Stan (and all of these identification issues avoided) is to seriously change the algorithmic backend to enable the geodesics as described by Simon and Mark’s paper.
I agree that having to fuss around with the charts is not principled or possible even practical, but I don’t think the Embedded Manifold Geodesics is necessarily the only way. For example, I know the people who solve ODEs on manifolds often do a chart switching scheme where when they get close to the edge of the current chart the code automatically switched to another chart on the backend.
If we’re talking about the software engineering of actually getting this in to Stan, the question is would that be harder or using a different embedded integrator on the constrained parameters.
I also noticed when I used the Embedded Geodesics method, the matrix exponential operation could at times have trouble numerically, and the HMC trajectory would get off the manifold or “slide off the tracks” so to speak. That was another reason we sought a re-parameterization because it keeps us on the manifold no matter what, even for larger HMC step sizes.
The last thing I’ll mention is that if you want to do hierarchical models over orthonormal matrices the first prior that comes to mind is something like the Matrix Langevin distribution. The problem with that though is that it has a PDF that’s intractable to compute. Alan Edelman (numerical analyst at MIT) had a paper on speeding up the computation involved there, but it’s still pretty difficult.
The nice thing about our transform is that we can achieve the hierarchical prior effect by simply placing something like a multivariate Gaussian over the transformed parameters.
