I’ve posted at GitHub - sethaxen/stan_semiorthogonal_transforms a Stan implementation of a transform using reflectors that avoids directly constructing reflector matrices, along with some example models that use different transforms:
full.stan: QR decomposing a full matrix as done in the OP (Stan uses Eigen’s HouseholderQR approach)reflectors.stan: constructing an orthogonal matrix from reflectorsreflectors_factors.stan: constructing an orthogonal matrix from reflectors but representing it in terms of the reflector factorshouseholder.stan: constructing Householder reflectors as dense matrices that are multiplied. (adapted from https://github.com/rsnirwan/HouseholderBPCA/blob/master/py_stan_code/ppca_house.stan. Not included due to lack of license)
And some benchmark results drawing 1000 draws each from 4 chains:
N=K=3 (100 runs)
| Model | Time (s) | ESS | ESS/s | ESS/nleapfrog |
|---|---|---|---|---|
| full | 0.08162 | 6292.1 | 36552.7 | 0.247655 |
| householder | 0.10652 | 4206.91 | 18469.3 | 0.184814 |
| reflectors | 0.13401 | 4192.07 | 14696.3 | 0.184204 |
| reflectors_factors | 0.07353 | 4192.07 | 26617.6 | 0.184204 |
N=K=10 (100 runs)
| Model | Time (s) | ESS | ESS/s | ESS/nleapfrog |
|---|---|---|---|---|
| full | 0.99254 | 5045.11 | 2450.11 | 0.141223 |
| householder | 1.72863 | 4048.65 | 1171.0 | 0.14366 |
| reflectors | 0.79541 | 4411.44 | 2462.89 | 0.156555 |
| reflectors_factors | 0.36824 | 4411.44 | 4926.62 | 0.156555 |
N=K=20 (10 runs)
| Model | Time (s) | ESS | ESS/s | ESS/nleapfrog |
|---|---|---|---|---|
| full | 8.1581 | 8160.53 | 464.858 | 0.136009 |
| householder | 21.6175 | 4089.65 | 88.6498 | 0.0712475 |
| reflectors | 5.132 | 4789.26 | 403.603 | 0.0831286 |
| reflectors_factors | 1.4957 | 4789.26 | 1145.05 | 0.0831286 |
N=K=30 (10 runs)
| Model | Time (s) | ESS | ESS/s | ESS/nleapfrog |
|---|---|---|---|---|
| full | 24.6639 | 6456.95 | 133.978 | 0.107616 |
| householder | 109.209 | 4109.39 | 19.5075 | 0.0684898 |
| reflectors | 11.889 | 5203.4 | 204.009 | 0.0867233 |
| reflectors_factors | 2.4767 | 5203.4 | 780.709 | 0.0867233 |
N=K=40 (10 runs)
| Model | Time (s) | ESS | ESS/s | ESS/nleapfrog |
|---|---|---|---|---|
| full | 79.1754 | 5322.27 | 38.1727 | 0.0868071 |
| householder | 232.552 | 3993.55 | 9.03856 | 0.0665591 |
| reflectors | 29.5462 | 4686.4 | 79.7552 | 0.0781067 |
| reflectors_factors | 4.4033 | 4686.4 | 396.463 | 0.0781067 |
N=K=50 (10 runs)
| Model | Time (s) | ESS | ESS/s | ESS/nleapfrog |
|---|---|---|---|---|
| full | 590.552 | 5035.86 | 5.14641 | 0.0646216 |
| householder | 500.663 | 3919.16 | 4.34798 | 0.0653193 |
| reflectors | 78.2099 | 4107.16 | 28.9358 | 0.0684527 |
| reflectors_factors | 6.6903 | 4107.16 | 235.338 | 0.0684527 |
Note that we’re not doing anything with these orthogonal matrices. The latent geometry in every case is an IID normal, so the speed differences are entirely from cost to compute the linear algebra on whatever number type is passed to it. This is reflected in the ESS/nleapfrog, which don’t vary much across all models. ESS is computed as the mean ESS for all entries in the orthogonal matrix.
The main observations when K=N are that using a product of dense reflector matrices (householder) is for all but the smallest orthogonal matrices the worst approach. Next comes QR-decomposing a full matrix. As expected, this has the best geometry, so it generally has the best ESS/nleapfrog post-warmup. (EDIT: I don’t think this argument makes sense since the geometry is IID normal for them all.) Using the implicit action of reflectors to build a dense matrix (reflectors) is much faster, since we both reduce the number of parameters and avoid the initial QR decomposition while still avoiding dense matrix products.
Finally, reflectors_factors imagines what if Stan added a native semiorthogonal_matrix[K,N] data type that stored the reflector factors and for which certain operations like multiplication, determinant, and linear solve were implemented in terms of those factors. Efficient algorithms exist for each of these and are implemented in Eigen and LAPACK. This is really the only option with these Householder-inspired transforms if one needs to efficiently infer a large orthogonal matrix.
For K >> N, the results will of course be different, as for a unit vector, all approaches converge to the same.