This is a follow-up of A better unit vector.
I’ve decided to put these things up at a blog because writing up a paper feels like too much. The post is at The Hyper-Tanh Peel: A Novel Parameterization for Bayesian Spheres – Pinkney’s Sufficient Stats. The below is mostly a copy-paste from the blog.
On the parameterization, I’m finding it to be very stable but I might’ve missed some issues with this.
1. Background
- Hyperspherical coordinates (angles) involve boundaries (0, \pi) that hinder Hamiltonian Monte Carlo (HMC) sampling.
- Muller’s method (normalizing standard Gaussians, how Stan parameterizes
unit_vector) is isotropic but over-parameterized (K parameters for K-1 dimensions) and suffers from a gradient singularity at the origin.
I’m calling this the hyper-tanh peel bijective parameterization. It is a mapping \mathbb{R}^{K-1} \to S^{K-1}. It uses “Logistic Geometry” to create a smooth, unconstrained manifold that is numerically stable and avoids the singularities of traditional methods.
2. Mathematical Formulation
The Transformation
The method “peels” dimensions one by one using the hyperbolic tangent function (\tanh), which maps (-\infty, \infty) \to (-1, 1), and closes the remaining 2D subspace with a stereographic projection.
Algorithm:
Let r_1 = 1. For i = 1, \dots, K-2:
The final two dimensions are parameterized as a stereographic coordinate:
The Jacobian Adjustment
The log-determinant of the Jacobian is:
This implies that: 1. The peeling parameters follow a sech distribution: p(x_i) \propto \text{sech}^{K-i}(x_i). 2. The core parameter follows a cauchy distribution: p(x_{K-1}) \propto (1+x_{K-1}^2)^{-1}.
3. Logistic Geometry & Gradient Stability
A problem with Muller’s method when used with HMC is the singularity at the origin. In Muller’s parameterization, \mathbf{y} = \mathbf{z} / \|\mathbf{z}\|. As \mathbf{z} \to 0, the gradient \nabla_\mathbf{z} \mathbf{y} explodes to infinity. This creates a “funnel” that traps HMC samplers.
The Hyper-Tanh Peel maps the origin of the parameter space \mathbf{x}=\mathbf{0} to the “North Pole” of the sphere \mathbf{y}=(1, 0, \dots, 0).
The derivative of the mapping is governed by \frac{d}{dx} \tanh(x) = \text{sech}^2(x). * At x=0, \text{sech}^2(0) = 1. * The gradient is linear and bounded. There is no singularity.
4. Visual Verification
This demonstrates two key properties using R: 1. Uniformity: With Jacobian correction, Hyper-Tanh matches Muller’s isotropy. 2. Stability: Hyper-Tanh has bounded gradients where Muller explodes.
Experiment A: Uniformity Check
I generate 2,000 points on a 3D sphere (S^2) using both methods and verify that they cover the projected disk uniformly.
Result: The Hyper-Tanh parameterization, when sampled with the correct Jacobian prior, is indistinguishable from Muller’s method. It is perfectly uniform.
Experiment B: Gradient Stability (The “y=0” Singularity)
The magnitude of the gradient \left\| \frac{d\mathbf{y}}{dp} \right\| is calculated as the parameter p passes through the origin.
- Muller: p=z. \mathbf{y} = z/|z|. Gradient \propto 1/|z|.
- Hyper-Tanh: p=x. y = \tanh(x). Gradient \propto \text{sech}^2(x).
Result: The Hyper-Tanh Peel eliminates the topological singularity. The sampler can pass through the origin without experiencing infinite forces.
5. Stan Implementation
Copy this block directly into your Stan program.
functions {
/**
* Maps unconstrained R^(K-1) vector x to Unit Vector S^(K-1).
*
* @param x Unconstrained vector of length K-1
* @param K Dimension of the embedding space (output vector size)
* @return Unit vector of length K
*/
vector hyper_tanh_to_unit_jacobian(vector x, int K) {
vector[K] y;
real r = 1.0;
for (i in 1:K - 2) {
real val = x[i];
y[i] = r * tanh(val);
real cosh_val = cosh(val);
r = r * inv(cosh_val);
real power = K - i;
jacobian += -power * log(cosh_val);
}
real last_x = x[K-1];
jacobian += log(2.0) - log1p(square(last_x));
real denom = 1.0 + square(last_x);
y[K-1] = r * (1.0 - square(last_x)) / denom;
y[K] = r * (2.0 * last_x) / denom;
return y;
}
}
data {
int<lower=2> K;
}
parameters {
vector[K - 1] x_raw; // Unconstrained
}
transformed parameters {
vector[K] mu = hyper_tanh_to_unit_jacobian(x_raw, K);
}
model {
// uniform on the sphere
}

