Before commenting I just wanted to say that most of @jsocolarâ€™s comments have been reasonably on target here, especially given the vagueness/generality of the question and the lack of anyone else commenting, and consequently this response comes across far more hostile than it needs to be.

In theory thereâ€™s only one way to transform probability distributions â€“ through *pushfoward* operations. Let X be the input space equipped with a probability distribution \pi, let Y be the output space, and let \phi be a transformation between the two, \phi : X \rightarrow Y.

If \phi is a *measurable* transformation then for all well-behaved sets in the output space, B \subset Y, the pullback sets, \phi^{-1}(B) \subset X, will be well-behaved sets in the input space. The details of what â€śwell-behavedâ€ť means and how the pullback sets are defined is are discussed in Section 1.4 and 2.2 of the aforementioned piece on probability theory, Probability Theory (For Scientists and Engineers).

From here on in Iâ€™m going to presume that all of the sets are unambiguously â€śwell-behavedâ€ť.

In this case the probability distribution \pi defined on X induces a *pushforward distribution* \phi_{*} \pi on Y by the probability allocations

\mathbb{P}_{\phi_{*} \pi}[B] = \mathbb{P}_{\pi}[\phi^{-1}(B)]

for any well-behaved set B \subset Y.

Equivalently the pushforward distribution can be defined in terms of expectation values â€“ the pushforward distribution is the unique distribution that satisfies

\mathbb{E}_{\phi_{*} \pi}[g] = \mathbb{E}_{\pi}[g \circ \phi]

for all functions g : Y \rightarrow \mathbb{R} with well-defined expectation values. This latter form will typically be more useful when trying to work out how various representations of a probability distribution transform.

The dimensions of X and Y donâ€™t have to be the same for \phi to be measurable. For example any well-behaved *projection* function_ from a higher-dimensional space X into a lower-dimensional space Y \subset X will be measurable. When X is a product space, X = Z_{1} \times Z_{2} and \phi is the projection function X \rightarrow Z_{1} or X \rightarrow Z_{2} then the pushforward distribution is also known as a *marginal* distribution. That said people are often sloppy about the term â€śmarginalâ€ť and use it to refer to any pushfoward distribution from a higher-dimensional space to a lower-dimensional space. If \phi is a bijection between two real spaces of equal dimension then it is typically referred to as a reparameterization.

This is all a bit abstract, but that abstraction allows us to unambiguously understand how representations of probability distributions like probability density functions and samples transform.

For example let X and Y be real spaces, each with a fixed parameterization/coordinate system, and hence a fixed uniform volume measure which allows us to define probability density functions. Then the pushforward condition becomes

\begin{align*}
\mathbb{E}_{\phi_{*} \pi}[g] &= \mathbb{E}_{\pi}[g \circ \phi]
\\
\int_{Y} \mathrm{d} y \, \phi_{*} \pi(y) \, g(y)
&=
\int_{X} \mathrm{d} x \, \pi(x) \, g \circ \phi(x).
\\
\int_{Y} \mathrm{d} y \, \phi_{*} \pi(y) \, g(y)
&=
\int_{Y} \mathrm{d} y \left[ \int_{X} \mathrm{d} x \, \pi(x) \, \delta(y - \phi(x)) \right] g(y),
\end{align*}

where \delta denotes the Dirac delta function. Consequently the pushforward density function can be defined by

\phi_{*} \pi(y)
=
\int_{X} \mathrm{d} x \, \pi(x) \, \delta(y - \phi(x)).

In general this integral is not traceable, although in a few special cases it reduces to more familiar forms. For example when \phi is a product space projection function \phi: X = Z_{1} \times Z_{2} \rightarrow Z_{1} then

\begin{align*}
\phi_{*} \pi(z_{1})
&=
\int_{Z_{1} \times Z_{2}} \mathrm{d} z_{1} \mathrm{d} z_{2} \, \pi(z_{1}, z_{2}) \, \delta(z_{1} - \phi(z_{1}, z_{2}))
\\
&=
\int_{Z_{2}} \mathrm{d} z_{2} \, \pi(z_{1}, z_{2});
\end{align*}

in other words we â€śintegrate out the nuisance variableâ€ť. Similarly if \phi is a reparameterization then we can use the properties of the delta function to derive

\begin{align*}
\phi_{*} \pi(y)
&=
\int_{X} \mathrm{d} x \, \pi(x) \, \delta(y - \phi(x))
\\
&=
\int_{X} \mathrm{d} x \, \pi(x)) | J(x) | \, \delta(\phi^{-1}(y) - x)
\\
&=
\int_{X} \mathrm{d} x \, \pi(\phi^{-1}(y))) | J(\phi^{-1}(y)) |,
\end{align*}

which is the usual â€śchange of variablesâ€ť equation.

What about samples? Here the pushforwad condition implies

\begin{align*}
\mathbb{E}_{\phi_{*} \pi}[g] &= \mathbb{E}_{\pi}[g \circ \phi]
\\
\lim_{n \rightarrow \infty} \frac{1}{N} \sum_{n = 1}^{N} g(y_{n})
&=
\lim_{n \rightarrow \infty} \frac{1}{N} \sum_{n = 1}^{N} g \circ \phi (x_{n}).
\end{align*}

In other words if

(x_{1}, \ldots, x_{N}, \ldots)

is a sample from X then

(\phi(x_{1}), \ldots, \phi(x_{N}), \ldots)

is a sample from Y! This is extremely straightforward to implement in practice, and indeed itâ€™s one of the reasons why sampling methods are so uniquely well-suited to exploring marginal/pushforward distributions.

Now we can reflect on some of Stanâ€™s variables types from this context.

Once we take into account the summation constraint a D-dimensional simplex is actually only (D - 1)-dimensional and so we can transform to a (D - 1)-dimensional space where the variables are more decoupled and work out the corresponding pushforward density function with a Jacobian determinant.

The space of D-dimensional unit vectors is not a real space and hence canâ€™t be globally parameterized in terms of real numbers/variables. It can, however, be defined as the image of a map from a higher-dimensional real space, in particular an embedding map from \mathbb{R}^{D + 1} to a sphere with any fixed radius. Stanâ€™s `unit_vector`

type is actually implicitly defined as a pushforward from \mathbb{R}^{D + 1}. When you add a `unit_vector`

declaration to the `parameters`

block the compiler automatically increments a probability density function over \mathbb{R}^{D + 1} â€“ I believe that itâ€™s a product of zero-centered normal density functions â€“ that pushes forward to a uniform probability density function over a sphere with radius R = 1.

When one runs Stan it generates Markov chain samples over \mathbb{R}^{D + 1} which can then be automatically mapped to samples from the D-dimensional unit sphere. That said once likelihood functions are introduced the geometry of the posterior density function over the latent \mathbb{R}^{D + 1} space can be surprisingly weird, especially when D is large, so this isnâ€™t always a great solution.

Iâ€™m not sure to what â€śsingularitiesâ€ť refers. So long as the transformation \phi : X \rightarrow Y is sufficiently measurable then the pushforward operation will be well-defined, and \phi has to be pretty messed up for it to not be measurable.

Anyway if that didnâ€™t address all of the questions then let me know.