Parameterizing an orthonormal matrix (the Stiefel manifold)

Seth_Axen · November 10, 2023, 12:04pm

I’ve posted at GitHub - sethaxen/stan_semiorthogonal_transforms a Stan implementation of a transform using reflectors that avoids directly constructing reflector matrices, along with some example models that use different transforms:

full.stan: QR decomposing a full matrix as done in the OP (Stan uses Eigen’s HouseholderQR approach)
reflectors.stan: constructing an orthogonal matrix from reflectors
reflectors_factors.stan: constructing an orthogonal matrix from reflectors but representing it in terms of the reflector factors
householder.stan: constructing Householder reflectors as dense matrices that are multiplied. (adapted from https://github.com/rsnirwan/HouseholderBPCA/blob/master/py_stan_code/ppca_house.stan. Not included due to lack of license)

And some benchmark results drawing 1000 draws each from 4 chains:
N=K=3 (100 runs)

Model	Time (s)	ESS	ESS/s	ESS/nleapfrog
full	0.08162	6292.1	36552.7	0.247655
householder	0.10652	4206.91	18469.3	0.184814
reflectors	0.13401	4192.07	14696.3	0.184204
reflectors_factors	0.07353	4192.07	26617.6	0.184204

N=K=10 (100 runs)

Model	Time (s)	ESS	ESS/s	ESS/nleapfrog
full	0.99254	5045.11	2450.11	0.141223
householder	1.72863	4048.65	1171.0	0.14366
reflectors	0.79541	4411.44	2462.89	0.156555
reflectors_factors	0.36824	4411.44	4926.62	0.156555

N=K=20 (10 runs)

Model	Time (s)	ESS	ESS/s	ESS/nleapfrog
full	8.1581	8160.53	464.858	0.136009
householder	21.6175	4089.65	88.6498	0.0712475
reflectors	5.132	4789.26	403.603	0.0831286
reflectors_factors	1.4957	4789.26	1145.05	0.0831286

N=K=30 (10 runs)

Model	Time (s)	ESS	ESS/s	ESS/nleapfrog
full	24.6639	6456.95	133.978	0.107616
householder	109.209	4109.39	19.5075	0.0684898
reflectors	11.889	5203.4	204.009	0.0867233
reflectors_factors	2.4767	5203.4	780.709	0.0867233

N=K=40 (10 runs)

Model	Time (s)	ESS	ESS/s	ESS/nleapfrog
full	79.1754	5322.27	38.1727	0.0868071
householder	232.552	3993.55	9.03856	0.0665591
reflectors	29.5462	4686.4	79.7552	0.0781067
reflectors_factors	4.4033	4686.4	396.463	0.0781067

N=K=50 (10 runs)

Model	Time (s)	ESS	ESS/s	ESS/nleapfrog
full	590.552	5035.86	5.14641	0.0646216
householder	500.663	3919.16	4.34798	0.0653193
reflectors	78.2099	4107.16	28.9358	0.0684527
reflectors_factors	6.6903	4107.16	235.338	0.0684527

Note that we’re not doing anything with these orthogonal matrices. The latent geometry in every case is an IID normal, so the speed differences are entirely from cost to compute the linear algebra on whatever number type is passed to it. This is reflected in the ESS/nleapfrog, which don’t vary much across all models. ESS is computed as the mean ESS for all entries in the orthogonal matrix.

The main observations when K=N are that using a product of dense reflector matrices (householder) is for all but the smallest orthogonal matrices the worst approach. Next comes QR-decomposing a full matrix. As expected, this has the best geometry, so it generally has the best ESS/nleapfrog post-warmup. (EDIT: I don’t think this argument makes sense since the geometry is IID normal for them all.) Using the implicit action of reflectors to build a dense matrix (reflectors) is much faster, since we both reduce the number of parameters and avoid the initial QR decomposition while still avoiding dense matrix products.

Finally, reflectors_factors imagines what if Stan added a native semiorthogonal_matrix[K,N] data type that stored the reflector factors and for which certain operations like multiplication, determinant, and linear solve were implemented in terms of those factors. Efficient algorithms exist for each of these and are implemented in Eigen and LAPACK. This is really the only option with these Householder-inspired transforms if one needs to efficiently infer a large orthogonal matrix.

For K >> N, the results will of course be different, as for a unit vector, all approaches converge to the same.

Topic		Replies	Views
New Transform for Orthonormal Matrices in Stan Publicity	26	3728	May 8, 2018
Efficient orthogonal matrix parameterization Modeling	16	3834	November 6, 2023
Generating orthogonal latents Modeling specification	17	792	April 19, 2023
How to constrain a matrix comprising several parameters to be an orthogonal matrix Modeling specification	7	2669	September 25, 2019
Orthogonal (not orthonormal) matrix parameters Modeling jacobian-adjustment	18	2093	May 4, 2021

Parameterizing an orthonormal matrix (the Stiefel manifold)

Related topics