Understanding of the user guide 11.1 (Jacobian adjustment)

Hi there,

Could you please help me to understand better the 11.1 of the user guide says “Without the Jacobian adjustment, optimization returns the (regularized) maximum likelihood estimate (MLE), argmax_{θ}p ( y | θ ), the value which maximizes the likelihood of the data given the parameters, (including prior terms).” However, there is no prior term in the equation argmax(.).

From what I understand, MLE assumes a uniform prior, this is why 11.1 also says “Applying the Jacobian adjustment produces the maximum a posteriori estimate (MAP), the maximum value of the posterior distribution, argmax_{θ}p ( y | θ )p(θ).”. When p(θ) is a uniform distribution, the estimation is MLE. Then what does “(including prior terms)” mean? Does it mean only the bound of the prior is considered such that the MLE is regularized (i.e., constrained within the bound of the prior specified in the prior)? In other words, even if I specify a prior of bounded Gaussian or lognormal in the model, this MLE still considers the prior as a uniform distribution but using the same constraints/bound of the specified prior (e.g., the same bound of the Gaussian prior or >0 for lognormal prior). I am not sure this understanding is correct. As explained in @avehtari’s case study here, when Jacobian adjustment is FALSE, the optimization returns the MAP in the unconstrained space, so the bound of the prior is not considered - the non-linear transformation of required for the bound is not adjusted.

So if the Jacobian adjustment is applied it will return the mode of the blue PDF (the last figure of @avehtari’s case study here), if not, it will return the mode of the red PDF? As red PDF is constrained in the prior bound ([0, 1]), it is a regulized MLE in [0, 1]? However, does the red PDF consider the prior as a uniform bounded in [0, 1] or the prior specified in the model, i.e., beta(1, 1)?

Often I see (Note: in optimization, the default value is 0 , for historical reasons.) in Stan/cmdstanr manual, what is the historical reason? Are there any recommended materials for reading on this topic?

Thanks a lot.

  • The Jacobian is not related to whether using maximum likelihood \mathrm{argmax}_{\theta}p (\theta | y) or or penalized (regularized) maximum likelihood \mathrm{argmax}_{\theta}p (\theta | y)f(\theta), where f(\theta) is the penalty function (if you don’t agree on having prior distributions), but the Jacobian is related to parameter transformations.
  • The mode is not invariant to parameter transformations, and that is why in general the mode with or without Jacobian adjustment are different. To find the mode in the constrained space, we need to use jacobian=FALSE. To get the correct posterior distribution in the unconstrained space we need to use jacobian=TRUE, but if we transform the parameter values of the mode of that distribution back to the constrained space, that doesn’t correspond to any mode.

If the Jacobian adjustment is not applied and we find the mode in the unconstrained space, and transform the modal parameter values back to the constrained space, we get the mode of the black line (and blue dash-dotted line). If the Jacobian adjustment is applied and we find the mode in the unconstrained space, and transform the modal parameter values back to the constrained space, the point will be in this example somewhere left of the mode of the black.

If the Jacobian adjustment is not applied, and we sample from the distribution in the unconstrained space, transform that sample back to the constrained space, and make a kernel density estimate we get the red dash-dotted line, of which mode does not correspond to anything useful. If the Jacobian adjustment is applied, and we sample from the distribution in the unconstrained space, transform that sample back to the constrained space, and make a kernel density estimate we get the blue dash-dotted line, of which mode corresponds to MAP in the constrained space.

1 Like

There is also a new version of that Jacobian case study using cmdstanr and new Laplace method: Laplace method and Jacobian of parameter transformation

1 Like

Thanks a lot, @avehtari. I understand that the Jacobian adjustment is required to get the correct distribution in the unconstrained space due to the distortion introduced by the non-linear transformation. To get the correct inference in the constrained space, Stan does the following process under the hood: convert to unconstrained space → apply the Jacobian adjustment to correct the distortion → sample this corrected distribution in the unconstrained space → transform back to the constrained space. Therefore, when we use mod$sample(), the users don’t even bother to tune this Jacobian adjustment option, so we don’t have it in the sample function.

However, I am still confused why by default we don’t need Jacobian adjustment in mod$optimize(), and the difference between mod$optimize(jacobian = T) and mod$optimize(jacobian = F). According to your explanation below, it seems that mod$optimize(jacobian = T) is not correct, because " If the Jacobian adjustment is applied and we find the mode in the unconstrained space, and transform the modal parameter values back to the constrained space, the point will be in this example somewhere left of the mode of the black.". Is this because that mod$optimize() does everything in the constrained space directly, as it only needs to find the mode without sampling the posterior?

And why cmdstanr manual says the following (see the snapshot below)? How do we link mod$optimize(jacobian = F) to MLE? Does this mean whatever the prior stated in the model is ignored and the uniform prior in the range of the parameter bound is used, because MLE assumes the uniform prior? Also, the manual says mod$optimize(jacobian = T) gives MAP, but in the case study and the explanation in this post, mod$optimize(jacobian = F) gives MAP (e.g., “To find the mode in the constrained space, we need to use jacobian=FALSE”).

here do you mean “that distribution” is the correct posterior distribution in the unconstrained space? If so, why transform the the mode of the correct posterior distribution in the unconstrained space back to the constrained space does not correspond to any mode?

It’s not an absolute need, but a choice made years ago to have that as the default.

jacobian = TRUE finds the mode in the unconstrained space, and jacobian = FALSE finds the mode in the constrained space. Both are valid choices, and Stan developers (before I joined) decided to make the choice they made.

All optimization (and sampling) happens in the unconstrained space. If jacobian = FALSE, then the parameter value of the mode in the unconstrained transformed to the constrained space is the parameter value of the mode in the constrained space.

It seems someone made a mistake there. The prior is not ignored, but if you are doing MLE, the person who wrote that assumed you did not define any prior.

The latest Stan reference manual 17.1 General configuration | Stan Reference Manual does clarify that with jacobian=TRUE it is MAP in the unconstrained space.

EDIT: addition: I think all Stan reference manual, CmdStan User Guide, and CmdStanR documentation should be updated, and I can add to my todo list to make a issues about that.

Because the mode is not invariant to transformations. You can think of posterior draws as presenting quantiles (4000 posterior draws present 4000 empirical quantiles). Quantiles are invariant to transformations, and thus e.g. median of the posterior in the unconstrained space transformed to the constrained space is median of the posterior in the constrained space.

You can consider a parameter x>0, with unconstrained parameter u=log(x) presenting. If x ~ log-normal(0,1) then u ~ normal(0,1). The mode of normal(0,1) is at 0 and exp(0)=1. But the mode of log-normal(0,1) is exp(0 - 1^2) = 0.37 (see, e.g. Log-normal distribution - Wikipedia). The median of normal(0,1) is 0 and exp(0)=1, and the median of log-normal(0,1) is exp(0)=1.

4 Likes

Thank you so much @avehtari. Now I understand these options clearly with your detailed explanations.

1 Like