Parameterizing an orthonormal matrix (the Stiefel manifold)

I’ve posted at GitHub - sethaxen/stan_semiorthogonal_transforms a Stan implementation of a transform using reflectors that avoids directly constructing reflector matrices, along with some example models that use different transforms:

  • full.stan: QR decomposing a full matrix as done in the OP (Stan uses Eigen’s HouseholderQR approach)
  • reflectors.stan: constructing an orthogonal matrix from reflectors
  • reflectors_factors.stan: constructing an orthogonal matrix from reflectors but representing it in terms of the reflector factors
  • householder.stan: constructing Householder reflectors as dense matrices that are multiplied. (adapted from https://github.com/rsnirwan/HouseholderBPCA/blob/master/py_stan_code/ppca_house.stan. Not included due to lack of license)

And some benchmark results drawing 1000 draws each from 4 chains:
N=K=3 (100 runs)

Model Time (s) ESS ESS/s ESS/nleapfrog
full 0.08162 6292.1 36552.7 0.247655
householder 0.10652 4206.91 18469.3 0.184814
reflectors 0.13401 4192.07 14696.3 0.184204
reflectors_factors 0.07353 4192.07 26617.6 0.184204

N=K=10 (100 runs)

Model Time (s) ESS ESS/s ESS/nleapfrog
full 0.99254 5045.11 2450.11 0.141223
householder 1.72863 4048.65 1171.0 0.14366
reflectors 0.79541 4411.44 2462.89 0.156555
reflectors_factors 0.36824 4411.44 4926.62 0.156555

N=K=20 (10 runs)

Model Time (s) ESS ESS/s ESS/nleapfrog
full 8.1581 8160.53 464.858 0.136009
householder 21.6175 4089.65 88.6498 0.0712475
reflectors 5.132 4789.26 403.603 0.0831286
reflectors_factors 1.4957 4789.26 1145.05 0.0831286

N=K=30 (10 runs)

Model Time (s) ESS ESS/s ESS/nleapfrog
full 24.6639 6456.95 133.978 0.107616
householder 109.209 4109.39 19.5075 0.0684898
reflectors 11.889 5203.4 204.009 0.0867233
reflectors_factors 2.4767 5203.4 780.709 0.0867233

N=K=40 (10 runs)

Model Time (s) ESS ESS/s ESS/nleapfrog
full 79.1754 5322.27 38.1727 0.0868071
householder 232.552 3993.55 9.03856 0.0665591
reflectors 29.5462 4686.4 79.7552 0.0781067
reflectors_factors 4.4033 4686.4 396.463 0.0781067

N=K=50 (10 runs)

Model Time (s) ESS ESS/s ESS/nleapfrog
full 590.552 5035.86 5.14641 0.0646216
householder 500.663 3919.16 4.34798 0.0653193
reflectors 78.2099 4107.16 28.9358 0.0684527
reflectors_factors 6.6903 4107.16 235.338 0.0684527

Note that we’re not doing anything with these orthogonal matrices. The latent geometry in every case is an IID normal, so the speed differences are entirely from cost to compute the linear algebra on whatever number type is passed to it. This is reflected in the ESS/nleapfrog, which don’t vary much across all models. ESS is computed as the mean ESS for all entries in the orthogonal matrix.

The main observations when K=N are that using a product of dense reflector matrices (householder) is for all but the smallest orthogonal matrices the worst approach. Next comes QR-decomposing a full matrix. As expected, this has the best geometry, so it generally has the best ESS/nleapfrog post-warmup. (EDIT: I don’t think this argument makes sense since the geometry is IID normal for them all.) Using the implicit action of reflectors to build a dense matrix (reflectors) is much faster, since we both reduce the number of parameters and avoid the initial QR decomposition while still avoiding dense matrix products.

Finally, reflectors_factors imagines what if Stan added a native semiorthogonal_matrix[K,N] data type that stored the reflector factors and for which certain operations like multiplication, determinant, and linear solve were implemented in terms of those factors. Efficient algorithms exist for each of these and are implemented in Eigen and LAPACK. This is really the only option with these Householder-inspired transforms if one needs to efficiently infer a large orthogonal matrix.

For K >> N, the results will of course be different, as for a unit vector, all approaches converge to the same.