Haven’t logged in here in a long time but thanks for the references!
Here’s one way to think about the issue. suppose for example you have the knowledge that “x,y are somewhere near the unit circle in x,y plane”. One way you could handle this is to parameterize as radius and angle. But another way is to provide a prior over x,y which puts high probability near this circle.
p(x,y) = normal(x,0,5) * normal(y,0,5) * normal(sqrt(x^2+y^2),1,.1)
is such a prior. You actually can just get rid of the first two factors. and use:
p(x,y) = normal(sqrt(x^2+y^2),1,.1)
by itself, but often in real world problems there’s some reason to first provide some basic “bounds” on each variable, and then tighten that with some joint information
What’s the “jacobian adjustment” required when you put “a prior” on sqrt(x^2+y^2) ?? there isn’t one. Why? Because it’s not a transformation of two variables x,y into two other variables foo,bar… it’s a direct expression of the joint prior on x,y Literally your prior knowledge is “x,y are distributed according to the density function normal(sqrt(x^2+y^2),1,0.1)”. Just be aware that if you calculate sqrt(x^2+y^2) you will not find that it has normal(1,0.1) distribution. Rather (x,y) pairs will have the distribution normal(sqrt(x^2+y^2),1,0.1) which is a different thing.
Often it’s worthwhile to express a prior on complicated parameter spaces as some vague independent priors over each parameter, multiplied by some “deformation factor” which squishes probability density in some regions of the space and expresses the dependency between the parameters.
For example, you might express a prior over spline knots as independently each one is “near 0” and then also each one is “not that far from the previous one”, which would express a bunch of dependencies between the dimensions. Doing this you could express priors over “smooth functions that don’t vary too quickly and stay near 0 everywhere”
Sometimes this technique turns out to be hard to sample, and you should reparameterize.