This isn’t really an answer to the question, but rather a general set of cautions. Even if you can work out a correct Jacobian adjustment, it might not achieve what you intend here.
It’s important to keep in mind that distances in Euclidian space are constrained in weird ways. We can already see this with three points in a 2-D plane. Given the distance ab
and the distance bc
, the distance ac
is constrained to be less than or equal to the sum ab + bc
and greater than or equal to the absolute value of the difference. As the number of points gets much larger than the dimensionality of the space, this constraint becomes ever more convoluted, because it’s hard to come up with a set of \frac{N(N-1)}{2} distances that are actually a valid set of distance between N points in the plane. This has a few implications:
- It means that we cannot readily
unless we can come up with some constraining transform that ensures that we end up with a set of valid distances. That seems like it’d be very very hard to do.
- It means that if we write
distances ~ std_normal();
we will not end up with a standard normal joint prior on the distances even if we apply the correct Jacobian adjustment. I doubt we even would get standard normal margins, though perhaps we would get very close to standard normal margins (??). We certainly would not get a standard normal joint distribution, because most of this distribution will sit over disallowed distance combinations.
- Therefore, it means that we will not be able to write down a generative prior for the distances.
In fact, there’s yet another reason why we cannot write down a generative prior for the distances. For any set of points, we can apply any translation we wish and retain the same set of distances. Thus, in addition to a prior statement on the distances, to get a generative prior we’ll also need some kind of prior regularization of the location of the point cloud in the plane (e.g. priors on the positions of the points themselves). But this prior will interact with whatever prior we write down on the distances. The interaction might be subtle or even negligible (I think this becomes likely if the scale of the prior on z
is large compared to the prior on the distance, but in this case the prior model becomes very weakly identified since we can translate the points around with considerable flexibility). In other settings, the interaction might be very strong. You can see this intuitively if we imagine that we have z ~ normal(0, .1);
and distance ~ std_normal();
. No way will we end up with standard normal distances!
The reason I’m raising these points is because one of the main motivations for applying Jacobian adjustments is to ensure that the prior pushforward on the transformed variable (distance in this case) actually corresponds to the distribution that is syntactically implied by the prior statement (e.g. target += normal_lpdf(distance | 0, 1);
is supposed to yield a joint normal distribution for the prior pushforward for distance
). Since this is guaranteed not to happen for you, it’s worth thinking very carefully about what you want this prior to achieve, and whether that would be properly achieved by a standard normal increment to the density accompanied by the Jacobian adjustment.
Since the prior-plus-Jacobian will not yield a generative prior no matter what, you can think of it more like a penalty term that penalizes model configurations with large distances. Seen this way, there’s nothing magical about working out the correct Jacobian; what you really want is to increment the target with something in place of the prior-plus-Jacobian that induces a reasonable penalty. You’ll be helped here by the symmetry of the transform; I’m pretty sure you can expect that penalizing distances won’t distort the underlying points anisotropically, for example. So you could tinker with some penalties, find one that seems to yield reasonable prior pushforwards, and just roll with it.
A final note: even if you put a penalty on the absolute magnitude of z
in order to get a proper prior (i.e. one that doesn’t allow you to translate the point cloud out to infinity), you will still have a prior model that doesn’t identify the positions of the points up to rotations and reflections. For the purposes of efficient sampling, it might be wise to fix the positions of two points in z, which will preclude such non-identifiabilities. However, in so doing, you are implicitly setting a distance scale for the model that might conflict with the scale that you desire via the prior.