Jacobian in reparameterizations

jmaronas · March 30, 2021, 8:26am

Hello all,

I have a question regarding the inclusion of the Jacobian in Stan log probability. There is a nice discussion in stan user guide about when we should include Jacobian and when we shouldn’t: Stan User’s Guide

In summary, one needs to add the Jacobian correction whenever the result of the transformation is given a distribution:


  real log_y;
  log_y = log(y);
  log_y ~ normal(mu, sigma);
  target += -log_y; // add jacobian

rather than applying the transformation to the sampled variable:

x ~ normal(0,1);
y = x*std + mu; 
target += 0 // no need to add Jacobian correction here.

However, I don’t understand why the later does not require us to add the Jacobian. As far as I know, in my latter code, the probability of y is given by:

p(y|\mu,\sigma^2) = \mathcal{N}(x|0,1)\text{abs}\left|\frac{\text{d}f(x)}{\text{d}x}\right|^{-1}

where f(x)=x\sigma+\mu. The log probability of y is computed by the log probability of x and the Jacobian resulting from transforming x into y. Why we don’t need to add the Jacobian correction in this latter case in Stan? Is it because it gets cancelled when computing the metropolis hasting accept reject ratio?

Thank you!

spinkney · March 30, 2021, 9:24am

In the later case, nothing is changing the target log density. You could just as easily have put it in generated quantities, which never gets touched by HMC.

There’s a more nuanced question though about exactly when you need the jacobian. Take a look at this, it may help http://models.street-artists.org/2017/05/30/once-more-about-stan-and-jacobian-warnings/

jmaronas · March 30, 2021, 9:26am

Thanks for the answer, will check that post. After a while I found this while googling, Demystifying Jacobian adjustment for transformed parameters in Stan, which seems to nicely explain what is the difference when applying the transformations in the ways I have outlined above (i.e what Stan is doing under the hood).

betanalpha · April 2, 2021, 7:03pm

The probability density of y is given by

p(y \mid \mu, \sigma) = \text{normal}(f^{-1}(y) \mid 0, 1) \cdot \left| \frac{ \mathrm{d} f }{ \mathrm{d} x} \right|^{-1}(f^{-1}(y)).

Now

\begin{align*} \text{normal}(f^{-1}(y) \mid 0, 1) &= \frac{1}{\sqrt{2\pi}} \exp \left( -\frac{1}{2} \left( f^{-1}(y) - 0 \right)^{2} \right) \\ &= \frac{1}{\sqrt{2\pi}} \exp \left( -\frac{1}{2} \left( \frac{y - \mu}{\sigma} - 0 \right)^{2} \right) \\ &= \frac{1}{\sqrt{2\pi}} \exp \left( -\frac{1}{2} \left( \frac{y - \mu}{\sigma} \right)^{2} \right) \end{align*}

This almost looks like a normal density function; the only problem is the normalization. Fortunately we still have that Jacobian determinant term,

\begin{align*} \left| \frac{ \mathrm{d} f }{ \mathrm{d} x} \right|^{-1}(f^{-1}(y)) &= \left| \frac{ \mathrm{d} (\mu + \sigma \cdot x) }{ \mathrm{d} x } \right|^{-1}(f^{-1}(y)) \\ &= \left| \sigma \right|^{-1}(f^{-1}(y)) \\ &= \frac{1}{\sigma}, \end{align*}

which is exactly the missing normalization!

In other words we’ve just proved that

\begin{align*} p(y \mid \mu, \sigma) &= \text{normal}(f^{-1}(y) \mid 0, 1) \cdot \left| \frac{ \mathrm{d} f }{ \mathrm{d} x} \right|^{-1}(f^{-1}(y)) \\ &= \frac{1}{\sqrt{2\pi}} \exp \left( -\frac{1}{2} \left( \frac{y - \mu}{\sigma} \right)^{2} \right) \cdot \frac{1}{\sigma} \\ &= \frac{1}{\sqrt{2\pi \sigma^{2}}} \exp \left( -\frac{1}{2} \left( \frac{y - \mu}{\sigma} \right)^{2} \right) \\ &= \text{normal}(y \mid \mu, \sigma). \end{align*}

In other words

\text{normal}(x \mid 0, 1) \\ y = \mu + \sigma \cdot x

is equivalent to

\text{normal}(y \mid \mu, \sigma)

exactly because of the Jacobian!

We don’t need a Jacobian in the model here because we’re not trying to force any behavior on y = \mu + \sigma \cdot x. Instead we’re forcing behavior on x and letting Stan figure out the induced behavior on y. If we want to work out that induced behavior analytically we have to include the Jacobin as above.

Jacobians are needed when we have a density that depends on a derived quantity, \pi(y). In general this will not induce behavior on x such that the behavior of the derived quantity y = f(x) is quantified by \pi(y). Instead the induced behavior is quantified by

\pi_{\text{induced}}(x) = \pi(f(x))

which then implies a density

\begin{align*} \pi_{\text{induced}}(y) &= \pi_{\text{induced}}(f^{-1}(y)) \left| \frac{ \mathrm{d} f}{\mathrm{d} x} \right|^{-1} (f^{-1}(y)) \\ &= \pi(f \circ f^{-1}(y)) \left| \frac{ \mathrm{d} f}{\mathrm{d} x} \right|^{-1} (f^{-1}(y)) \\ &= \pi(y)) \left| \frac{ \mathrm{d} f}{\mathrm{d} x} \right|^{-1} (f^{-1}(y)) \end{align*}

for the derived quantity. Note that here the Jacobian term skews the behavior of y from the expected \pi(y)! If we had included the inverse Jacobian determinant correct as we should have then it would have canceled with this extra term to give \pi_{\text{induced}}(y) = \pi(y) as desired.

jmaronas · April 5, 2021, 11:33am

Thanks for your explanation @betanalpha .

The only thing I dont understand is, what do you refer by “derived quantity” here: Jacobians are needed when we have a density that depends on a derived quantity, \pi(y)

From what I understand, \pi(y) is the distribution induced by transforming x with f() or equivalently, y with f()^{-1}, which would be something similar to my analytic example where f() is given by a linear transformation.

betanalpha · April 5, 2021, 6:53pm

In Stan the target distribution is defined via a probability density function specified in the model block over the parameter space specified in the parameters block. Any function of the parameters (other than the identify function) is considered a derived quantity.

Any statement like

target += example_lpdf(parameters | anything);

doesn’t require a Jacobian, but any statement like

real derived = function(parameters);
target += example_lpdf(derived | anything);

will require a Jacobian correction if you actually want the marginal density for derived to be example.

One of the courses of potential confusion here is that there are many density functions around. Given a nice transformation f : X \rightarrow Y any probability density function on X, \pi(x) defines an equivalent density function on Y, \pi_{\text{induced}}(y). But we can also define an infinite number of densities functions on Y that have no relation to \pi(x). Plugging x = f^{-1}(y) into \pi(x) defines a probability density function on Y, but not the equivalent density function. The equivalent density function requires a Jacobian correction.

mike-lawrence · April 5, 2021, 7:35pm

Am I correct that something like (silly example for clarity):

model{
  vector[n] derived_var = (data_var - parameter_1)/parameter_2 ;
  target += std_normal_lupdf(derived_var) ;
}

doesn’t need a Jacobian adjustment because the operations producing derived_var are linear transformations?

nhuurre · April 5, 2021, 7:46pm

That target written out is

target += -0.5*(derived_var)^2;

I guess this is meant to be a different way to write

target += normal_lpdf(data_var| parameter_1, parameter_2);

but that’s the same as

target +=  -log(parameter_2) - 0.5*((data_var - parameter_1)/parameter_2)^2;

That extra -log(parameter_2) is the log Jacobian correction for multiplying by parameter_2.

betanalpha · April 12, 2021, 5:57pm

Technically one always needs Jacobian corrections, but because Stan requires target densities specified only up to constants Jacobian corrections that reduce to constant values can be ignored. A Jacobian correction is constant if and only if the transformation is linear.

Division, however, is not a linear operation and instead introduces a 1 / parameter_2 Jacobian correction.

Topic		Replies	Views
Basic (newbie) question about transforming a parameter and Jacobians Modeling specification	6	1363	July 26, 2017
Jacobian adjustment in the acceptance probability Algorithms	11	3974	September 4, 2017
Lognormal with additive effects on the original scale Modeling techniques , specification , loo	4	1035	September 4, 2017
Log normal distribution in Stan General	4	1250	January 15, 2021
"jacobian not necessary" for transformed normal location parameter? Modeling specification	3	655	June 10, 2018

Jacobian in reparameterizations

Related topics