This unfortunately is completely incorrect. Having “often (mostly?)” as a best case is very good indicator that the problem is being approached incorrectly.
Change of variables follows from probability theory which is airtight in what you can and cannot do.
-
If you have a one-to-one transformation then the transformed density is given by the original density evaluated at the transform inverse augmented with the absolutely value of the Jacobian determinant.
-
If you have a one-to-many transformation then you do the above but sum over all of the possible values of the transform inverse. This behaves well except in cases where there are infinite number of values, for example when you try to map a circle into the real line by transforming (0, 2pi) to (-\infty, \infty), in which case the transform becomes ill-posed.
-
If you have a many-to-one transformation then you have to be very careful to ensure that you can self-consistently define a probability distribution on the transformed space. There is no “implicit distribution theorem” – the only way to do this is to “buffer” the output with additional variables so that you can define a one-to-one transformation, solve for the appropriate density on the buffered space, and then marginalize out the buffer variables. This is, for example, how one goes from a 2D Gaussian to a Rayleigh distribution or a 3D to a Maxwell, or the N simplex to the N-1 dimensional unconstrained real space we use in Stan (as @Bob_Carpenter notes below) . Typically this approach is limited to cases where the marginalization can be done analytically so it’s hard to apply to complex problems like this.
There are many heuristics that capture some of these properties, but as with many heuristics they do not capture all of them and hence inevitably lead one to invalid answers. And unfortunately those violations tend to be in exactly the difficult problems of applied interest!