if p(data,theta) dtheta is a measure on the parameter space theta, and Stan wants to sample in an unconstrained space eta then the probability that Stan needs to enforce is that

p(data,theta(eta))dtheta is still the probability to be in a micro-volume dtheta even though we’re moving around in eta space

The probability for the sampler to be in a volume deta needs to be

p(data,theta(eta)) dtheta(eta)/deta deta = p(data,theta) dtheta(deta)

where by dtheta(eta)/deta in the general multivariate case we use the determinant of the jacobian matrix dtheta_i/deta_j because this determinant measures the scaling of the n dimensional volume mapping deta -> dtheta, Note that the Jacobian determinant is a function of the eta in general. For linear transformations, it’s a constant, and so can be dropped because constants don’t matter for MCMC.

This all makes perfect sense in nonstandard analysis because now those symbols “dtheta/deta” are in fact the ratios of infinitesimal numbers. The whole calculation is just straightforward algebra.

The jacobian only comes into play when we are transforming the space in which we’re sampling. If you sample in say “foo” space and then want to say p(bar | q(foo)) = something then the fact that you’re transforming foo through q doesn’t matter because it doesn’t alter the probability for the sampler to be in a tiny volume dfoo, it alters the probability for bar, which is altered in precisely the way that you want it to be because the probability of bar depends on the value q(foo) by definition of your chosen model.

The way Stan works, is it chooses unconstraining transforms to create an “eta” space which is the whole real line in each dimension, and then it samples in that eta space. Then when it needs to spit out a theta value, it uses the constraining transform to calculate theta(eta) and spits out that value.

So, Stan behind the scenes is adding the log(dtheta/deta) values to the lp value so that eta samples in the right distribution to give theta(eta) the distribution you asked for.

If you do any transformations of variables yourself, then you’ll have to decide what they mean to you. But the principle is that if you know the measure on a transformed space, in order to get the transformed value to have that measure, the space in which you’re sampling has to have the measure p(theta(eta)) dtheta/deta deta just think of all those d values as infinitesimal numbers and the algebra becomes clear (and this is how nonstandard analysis works).