Yes that looks right to me.
I am not super familiar with Stan’s notation for indexing matrices - I would have probably though something like matrix[n, n-1] w = v[:, 2:n];
instead of matrix[n, n-1] w = v[, 2:n];
based on my prior from using Python mostly. I think svd_V
also only accepts matrix
arguments so you would possibly need something like
matrix[1, n] A;
A[1, :] = a;
vector[n] v = svd_V(A);
In terms of the ‘forward’ computation involved in simulating from the model or evaluating the (log) joint density and its gradient, then actually the former I think should be the faster as it only involves \mathcal{O}(N) vector operations compared to the latter in which the SVD should be \mathcal{O}(N^2) for a N \times 1 matrix. The main benefit of the latter is that it avoids using an overparametrized representation, with the former approach projecting a N dimensional vector on to a N - 1 dimensional subspace, which by introducing a non-identifiability may lead to a more challenging posterior geometry.
For the question of equivalence - yes I believe the distributions of c in both the cases I described should be the same. In both cases they are linear transformations of Gaussian distributed vectors and so will be Gaussian (if we allow the definition of Gaussian distributions to be extended to those supported on linear subspaces / with singular covariance matrices).
For the case c = b^\mathsf{T} - (b^\mathsf{T} a) / (a^\mathsf{T} a) a, using index notation with repeated indices indicating summation and that \mathbb{E}[b_i] = 0, \mathbb{E}[b_i b_j] = \delta_{ij}
\mathbb{E}[c_i] = \mathbb{E}[b_i - (b_j a_j)/(a_k a_k) a_i] = 0
\mathbb{E}[c_i c_j] = \mathbb{E}[(b_i - (b_k a_k) / (a_l a_l) a_i)(b_j - (b_m a_m) / (a_n a_n) a_j)] = \delta_{ij} - a_i a_j / (a_k a_k)
so \mathbb{E}[c] = 0 and \mathbb{V}[c] = \mathbf{I}_N - (a a^\mathsf{T}) / (a^\mathsf{T} a).
For the case c = W b, as W is only a function of a we immediately have \mathbb{E}[c] = 0 and \mathbb{V}[c] = \mathbb{E}[W b b^\mathsf{T} W^\mathsf{T}] = W W^\mathsf{T}.
We know that V = [\hat{a} ~W] where \hat{a} = a / \sqrt(a^\mathsf{T} a) and V V^\mathsf{T} = \mathbf{I}_N. We also have that V V^\mathsf{T} = \hat{a}\hat{a}^\mathsf{T} + W W^\mathsf{T} and so that W W^\mathsf{T} = \mathbf{I}_N - \hat{a}\hat{a}^\mathsf{T} = \mathbf{I}_N - aa^\mathsf{T} / (a^\mathsf{T} a).
Therefore in both cases c is Gaussian distribution with zero-mean and covariance matrix \mathbf{I}_N - (a a^\mathsf{T}) / (a^\mathsf{T} a).