Von_mises documentation suggestion

Let me make an argument from the perspective of dynamics in parameter space.
We have two particles x and y which feel forces (gradients). We are doing dynamics constrained to a circle. Let me say E is the potential energy part of the Hamiltonian.
If I pull on x, there is a non-zero dE/dy via the x^2+y^2=1 term. Stan’s unit vector code handles this.
Now I add a concept (not particle) to the system called mu. This is a virtual particle in classic (not the quantum kind) physics. Its coordinates are defined in terms of x and y. My potential energy depends on this mu particle. A force acting on mu has to be referred back to x and y. So if I have a prior acting on mu, I have dE /dx=dE/dmu times dE/dx and the same for y.
I promise this is the case in a Newtonian dynamics simulation. This is the reasoning you use if you have a force acting on the centre of a group of particles. I also concede that my analogy with physics will eventually be a bit strained.

But if I do that, sampling is no longer uniform for theta—it has dips at pi/2 and -pi/2. So…

But when it dips for x, it does not for y, or am I thinking incorrectly ?

You convinced me that I need the extra derivatives. The physics analogy stopped me doing things like hiding the transformation in a function.

What I cannot fit in my picture is @betanalpha Michael’s point that there is an infinite number of mu’s. I see the logic and I guess this is where my analogy breaks.

This is not a proper analogy.

The densities that we have been discussing here are probability density functions with respect to a Lebesgue measure on some real numbers (or presumably the uniform measure over a sphere). Under a 1-1 transformation that density function has to be modified with an inverse Jacobian determinant because the Lebsegue measure transforms with a factor of the Jacobian determinant (i.e. volumes are warped so the density has to warp in the opposite way).

Your physical picture corresponds to a simple function, the potential energy, which does not transform in the same way. You simply compose the potential energy with the 1-1 transformation to give the potential energy on the new space. Because the physical system does not have to worry about Jacobians there is no way that you can use it to reason about Jacobians for probability density functions.

atan2 : R x R -> R

atan2 is the inverse transform here. We’re going from the (x,y) coordinates to an angle in (-pi, pi) radians.

What’s partly confusing me is that we implement the unit_vector thing under the hood to overparameterize a point on the circle in terms of (x,y) coordinates. The nice part is that neighborhoods of (x,y) map to neighborhoods on the circle; the bad bit is that there’s really only one degree of freedom.

So when I map (x,y) back to an angle, do I need to include a Jacobian if I want to put a distribution on the angle? What’s really going on is that the default independent normal priors make (x,y) equally likely to take on any angle.

Are you saying that

  • the extra t1 = 1 + (mu_vec[2] * mu_vec[2]); target += - log (t1);..., terms are wrong or
  • the extra terms with the derivatives are correct, but I got them via the wrong reasoning ?