How to constrain a matrix comprising several parameters to be an orthogonal matrix

Hi,

I am currently dealing with a matrix comprising several parameters as follows:

mat=\begin{bmatrix}cos(a_1)sin(b_1) & cos(a_2)sin(b_2) & cos(a_3)sin(b_3)\\ cos(a_1)cos(b_1) & cos(a_2)cos(b_2) & cos(a_2)cos(b_2)\\ -sin(a_1) & -sin(a_2) & -sin(a_3) \end{bmatrix}

This matrix mat should be constrained as an orthogonal matrix, i.e. the three column vectors being orthogonal to each other.

Now I need to put priors on these six parameters (a_1, a_2, a_3, b_1, b_2 and b_3), but what stumps me is that I don’t know how to ensure that sampled parameter values from the prior distributions can make up an orthogonal matrix.

I hope that there is a data type of “orthogonal matrix” that I can declare for “mat”. Unfortunately, Stan would allow me to do this and does not provide an easy workaround to orthogonal constraint, at least to my limited knowledge of Stan.

I wonder if anyone could kindly provide any insights into this problem.

Many thanks.

2 Likes

Although I’m far from a linear algebra buff, I was interested by your question and did some quick research online to see how one might generate an orthogonal matrix from simple elements. Wikipedia to the rescue:

“To generate an (n + 1) × (n + 1) orthogonal matrix, take an n × n one and a uniformly distributed unit vector of dimension n + 1. Construct a Householder reflection from the vector, then apply it to the smaller matrix (embedded in the larger size with a 1 at the bottom right corner)”

As you want to generate a 3x3 matrix, you could start to generate with generating a 2x2 based on the properties described here in the section “elementary construction”: https://en.m.wikipedia.org/wiki/Orthogonal_matrix

If you want to define priors on the elements of the orthogonal matrix, you’ll have to correct the posterior for the Jacobian of the inverse transform (which or magics words to me, so sorry can’t help there). The Stan reference manual or other people here may provide more advice on that.

2 Likes

Nice thing about the elementary construction methods above is that they seem to be exclusively based on uniformly distributed unit vectors, which are easy to specify in Stan.

Hi Luc:

Thank you so much for your detailed reply. I will try to implement your awswer in Stan, and get back to you on my update.

Edit: It was early and I got mixed up about time series models and factor models.

Orthogonal matrices can be useful for factor models and implementing PCA-like models.

It will be great if Stan can define an orthogonal matrice type.

I deal with a lot of coordinate system transformation and the rotation matrix is always an orthogonal matrix. However, I don’t know how to implement an orthogonal matrix in Stan. I feel like many other Stan users may have the same problem of using orthogonal matrix with Stan.

The space of orthogonal matrices is non-Euclidean, which means that it can’t be globally parameterized with a set of real numbers. While Hamiltonian Monte Carlo is technically applicable to these non-Euclidean spaces, the implementation of Hamiltonian Monte Carlo in Stan presumes such a parameterization and hence we cannot directly implement a space of orthogonal matrices (just as we can’t directly implement spheres and torii).

The only work around for this is to embed the non-Euclidean space in a larger Euclidean space. This is what is done in the “unit_vector” type (which is equivalent to a sphere), although that embedding can lead to computational problems, especially in higher dimensions. There have been some papers on trying a similar embedding for spaces of orthogonal matrices but the computational fragility of these approaches is still unresolved.

2 Likes

Orthogonal matrices have been discussed previously. There’s even code for building one with Householder reflections. However I don’t think that’s going to work well, mainly because there’s a discontinuity where the matrix flips from positive-determinant to negative-determinant.

For dealing with coordinate systems you probably want a special orthogonal matrix, i.e. one that has positive determinant. This additional constraint makes the matrix group at least connected so there’s some hope that Stan could sample it.

It’s not too difficult to create three orthonormal vectors in three dimensions:

  1. take two arbitrary vectors
  2. project one of them to the plane orthogonal to the other
  3. normalize both
  4. take their cross product to create the third vector

This tranlates to Stan program

parameters {
  row_vector[3] a;
  row_vector[3] b;
}
transformed parameters {
  matrix[3,3] r;
  {
    row_vector[3] ax = a/sqrt(dot_self(a));
    row_vector[3] bx = b - dot_product(b, ax) * ax;
    bx = bx/sqrt(dot_self(bx));
    r[1] = ax;
    r[2] = bx;
    r[3] = [ax[2]*bx[3] - ax[3]*bx[2],
            ax[3]*bx[1] - ax[1]*bx[3],
            ax[1]*bx[2] - ax[2]*bx[1]];
  }
}
model {
  a ~ std_normal();
  b ~ std_normal();
  target += -5*square(dot_product(a,b));
}

The target+= statement is rather arbitrary. It doesn’t affect the distribution of the matrix but keeps the underlying vectors from being too collinear. The parametrization is similar to the unit vector and this other thread discussing problems with unit vectors may be relevant.

3 Likes