Suppose I have a model involving a unit_vector and I want to place a prior on some transformation of the unit_vector. In the simplest case, I will scale the unit vector by a scalar radius, r.
data {
vector[N] y;
matrix[N, K] x;
}
parameters {
unit_vector[K] u;
real<lower=0> r;
real<lower=0> sigma;
}
transformed parameters {
vector[K] beta = u * r;
}
model {
y ~ normal(x * beta, sigma);
beta ~ normal(0, 1); // Jacobian?
}
I have trouble thinking in hyperspheres, so I’m stuck on the Jacobian (if one is needed at all) for this change of variables (u, r) \rightarrow \beta, \beta = u\cdot r. Is the idea to scale the volume of a hypersphere? The derivative of the volume with respect to the radius is the surface area. Would it just be the formula for the surface area of a K-dimensional hypersphere of radius r, where C_K is a constant equal to the surface area of a unit hypersphere?
\begin{gathered}
|J| = C_K\cdot r^{(K-1)} \\
\ln(|J|) = \ln(C_K) + (K - 1)\ln(r)
\end{gathered}
Then in Stan I would do target += log(C_K) + (K - 1) * log(r). Since we can omit constant additive terms it could be simplified to target += (K - 1) * log(r).
This prior seems a bit hard to reason about and I find it more natural to put a prior on r directly. However, it appears your answer is in A better unit vector - #30 by Seth_Axen and I think you’re right (I’m a bit confused on the negative sign).
Where @Seth_Axen writes:
bijectively map (x, r) \in \mathbb{S}^n \times R_{>0} to \mathbb{R}^{n+1} \backslash \{0\} with y = r x, and use the Jacobian correction -n \log r to get the log-density \log\pi_y(y) = \log\pi_x(y/\lVert y \rVert) + \log\pi_r(\lVert y \rVert) - n\log \lVert y \rVert.
Where y \in \mathbb{R}^{n + 1}.
@spinkney is way better at these transforms than me, so take this with a grain of salt.
Given that the transform u to r * u is linear, you get the same result working out the Jacobian determinant and sampling
u * r ~ normal(0, 1);
target += K * log(r); // change of variables adjustment
as you do from just moving the denominator into the normal scale,
u ~ normal(0, 1 / r);
The (inverse) transform is just u to r * u. The Jacobian is just a diagonal matrix with r in each entry, so the determinant is just r^K and the log determinant becomes K * log(r). That’s exactly the term you get from the normal(0, 1/r) which includes a -log(1/r) = log(r) term once for each of the K entries.
The kernel of the log density also changes form. It’s now ((u - 0) / 1)^2 = u^2 instead of ((u*r - 0) / r)^2 = u^2.
In other words, we get the same result changing the density to sample u as normal(0, 1/r) as we get from sampling u * r as normal(0, 1) and applying the change-of-variables correction.
1 Like