A better unit vector

I think people have mostly converged to a similar perspective but let me add a few words.

As I’ve discussed various times in other places (see for example Understanding non-bijective variable transformations - #19 by betanalpha) a sphere cannot be directly parameterized with real numbers. One can parameterize most of a sphere using real numbers but there will always be some artifact due to the incomplete parameterization. Whether or not that artifact has practical consequences depends on the details of a particular application.

For example let’s consider the one-dimensional circle which can be almost completely parameterized by real intervals such as \theta \in (0, 2 \pi). Notice that the point \theta = 0 = 2 \pi is not included in this interval; the exclusion of this point is the artifact. Whether or not that artifact will be problematic will depend on the target distribution over this space.

If we our target distribution is specified by a uniform density function,

\pi(\theta) \propto \text{constant}

then the artifact will be pretty negligible. In particular the can implement this model in Stan with

\theta = 2\, \pi \, \text{logistic}(y)

where y is unconstrained. Taking the Jacobian into account gives a well-behaved unconstrained density function \pi(y) with which Stan will have no trouble.

But what happens if our target distribution is specified by a von Mises density function,

\pi(\theta) \propto \exp( 100 \, \cos(\theta) )?

The problem here is that the target density function concentrates right on that artifact, and we won’t be able to ignore it so readily. Indeed once we transform to the unconstrained space as before we get a much less pleasant, multimodal density function.

Notice that the von Mises density function is unimodal – this multimodality is a consequence of not completely parameterizing the circle.

At the same time if we shift the von Mises density function to the other side of the circle,

\pi(\theta) \propto \exp( 100 \, \cos(\theta - \pi) )?

then the transformed density function will be manageable again.

In higher dimensions the artifact of using hyper spherical coordinates goes from being a point to a line, but the qualitative consequences are the same. This incomplete parameterization can be useful, but only if the target distribution allocates negligible probability to the neighborhood around the artifact which has to be validated by the user for each particular application.

The current unit_vector method is based on embedding a sphere in a higher-dimensional real space and then lifting any target distribution over that initial sphere to the higher-dimensional space. In particular an implicit distribution is imposed over the radial direction transverse to the embedded sphere.

As with the previous approach this method works reasonably well for a uniform distribution over the sphere but can run into problems when the distribution concentrates into smaller neighborhoods, especially in higher-dimensions. Very conceptually the lifted distribution in the higher-dimensional embedding space can become “wedge”-like and frustrate the sampler.

These methods aren’t quite complementary, but they can both be useful in different circumstances. I wouldn’t be against presenting both provided that their limitations are clearly stated. In particular the hyper spherical coordinate method has to be presented as an approximation that will be valid only when the spherical coordinates are properly aligned with the target distribution.

5 Likes