Geodesic Monte Carlo implementation in Stan?

Hello,

I am wondering whether geodesic Monte Carlo on Stiefel manifold using the algorithm in https://arxiv.org/pdf/1301.6064.pdf (Byrne and Girolami 2013) can be easily implemented in Stan? Thank you!

This code which implemented this paper (new metric in Riemannian Manifold Hamiltonian Monte Carlo) under the current stan’s framework might be a reference, maybe?

1 Like

Several reasons why RMHMC implementation is not exposed is explained in this post here with some summary as follows.

  • has fragile numerical integration
  • hard to detect the source (geometry or implementation) of problems
  • can achieve same performance from model reparameterization instead of chaing metric

@betanalpha if you still hold this view, I wish to ask in what ways studies on Riemann manifold could contribute to Stan? Both for MCMC and optimization.

Optimization is included because I thought the second problem above could be relatively overlooked for optimizing function where performance has priority over diagnostics (of course not that they are unnecessary). Also its interface already provides several algorithms to choose from.

3 Likes

I haven’t tried it myself, but for the specific problem of inference on the Stiefel manifold Jaunch, Hoff, and Dunson report some promising Stan-based results with a data augmentation + polar decomposition approach described at
https://arxiv.org/abs/1906.07684

3 Likes

Thank you for your response and sorry for my late response! @weylandt I did try the method by Jaunch, Hoff and Dunson, but it did not work so well in my case and I was hoping to use the geodesic approach (Byrne and Girolami 2013) for more efficient sampling on the Stiefel manifold.

Unfortunately not.

One of the complications of working with non-Euclidean manifolds like spheres, torii, Stiefal manifolds, and the like is that global coordinate systems/parameterizations don’t exist. Without a global coordinate system we can’t uniquely define quantities like component means and variances, such as those that are shown in Stan output by default, and we can’t specify each Markov chain Monte Carlo sample with just a vector of real values.

Instead we would have to work in local coordinate systems (see for example Figure 5 of [1410.5110] The Geometric Foundations of Hamiltonian Monte Carlo) which don’t cover the entire space. In Stan this would mean keeping track of which coordinate system each sample is in then at the end registering all of the samples consistently when trying to evaluate functions and Markov chain Monte Carlo estimators of the expectation values of those functions.

To summarize implementing Hamiltonian Monte Carlo on non-Euclidean spaces is theoretically straightforward but in practice would require completely changing how Stan processes samples and reports summary results while also redoing the entire algorithm architecture.

Can I ask how you’re using a Stiefel manifold? In a few cases there are some modeling workarounds.

One potential source of confusion – “Riemannian Manifold Hamiltonian Monte Carlo” algorithms are actually all defined on Euclidean spaces and not non-Euclidean spaces! In fact the non-constant metric that differentiates these methods is actually still a Euclidean metric!

This is why I was integrated that “Riemannian” HMC implementation in Stan (which works only on Euclidean spaces) without the architecture issues I mentioned above.

I do still hold this view, especially given how much easier it is to work with reparameterizations instead of implicit symplectic integrators! In particular I think that the possible benefits of an implementation in Stan are rather small, but they may not be zero. Working with lifted explicit integrators, for example of a recent one see ehttps://arxiv.org/abs/1910.06243, would get around some of the integration problems and might be useful if the overhead they introduce isn’t too burdensome. That said lots of experimentation would be needed before considering incorporation into Stan.

This method is not correct as the parameterization they suggest is not global (nor can it be! – see the discussion above). The practical consequence of this are line “defects” in the transformation that introduce awkward behavior that pretty easily messes up sampling.

4 Likes

Thank you very much for your detailed reply! I need to sample a p by k factor loading matrix (p > k) with no constraint on the entries. But because the mixing of the chains is poor, I decide to sample its left singular vectors and singular values instead (the right singular vectors are not identifiable in my model).

2 Likes

Yes! This is a well-known challenging problem and using a singular value or eigen decomposition to build a sufficiently regularizing prior model is a natural approach up until one has to face those Steifel manifolds.

That said I’m pretty sure the there’s a way to build useful prior models for these problems that doesn’t involve parameterizing rotations (and hence can be written in Stan right now). The bad news is that I’m pretty sure only because a colleague is working on the problem and I’m not sure when any of the results will be sharable!

Apologies for the poor timing.

1 Like

Thank you for your reply! Yes I am aware of good priors such as Dirichlet-Laplace prior (Bhattacharya et al., 2015) for factor loading matrix in factor models, but the mixing of the chains is still not ideal. Of course it would be great to let me know the prior your colleague developed once it is sharable!

2 Likes

The problem isn’t the shape of the prior density but rather on which parameters you place it! I can confirm that there are useful prior models that regularize just the degenerate directions of unsupervised factor models without having to appeal to eigendecompositions but the manuscript is in development and will likely take at least a few months before being made public.

Please keep us informed when a preprint is available!