Hi!
Thanks for another release :)
I’m a bit confused by jacobian +=
, and I don’t find much about it. When should I use it? How is it different from target +=
?
Hi!
Thanks for another release :)
I’m a bit confused by jacobian +=
, and I don’t find much about it. When should I use it? How is it different from target +=
?
There is a bit about this in the documentation, but the short answer is that in many situations, they are more-or-less equivalent. The jacobian += statement is useful for both conveying intent and it can be used in transformed parameters
, which target += cannot.
When are they not equivalent, you might be asking? Well, the C++ that jacobian += ...
gets compiled down to looks something like
if (use_jacobian) {
target += ...
}
Where the condition value there is usually True, but in some circumstances (like optimization, by default), it is not. This lets you implement your own transforms that work the same way as the built-in transforms, which also do not apply the change of variables within optimization (by default)
Thanks! Very clear answer!
And by the way: Why is it that the jacobian adjustment shouldn’t be used in optimization?
It’s not necessarily that it shouldn’t, you just get two different answers depending on whether you do or not. I think the docs state it more precisely than I could:
Without the Jacobian adjustment, optimization returns the (regularized) maximum likelihood estimate (MLE), the value which maximizes the likelihood of the data given the parameters, (including prior terms).
Applying the Jacobian adjustment produces the maximum a posteriori estimate (MAP), the maximum value of the posterior distribution.
For a long time, Stan only provided optimization for the MLE (no jacobian adjustment). Nowadays, you can request the jacobian adjustments be included to estimate the MAP, but the default is still to exclude them.
AFAICT the docs are wrong, and that the difference between using the Jacobian adjustment or not has been accurately described here: https://users.aalto.fi/~ave/casestudies/Jacobian/jacobian.html#Parameter_transformation_and_Jacobian_adjustment
I had fixed doc elsewhere, but it seems not in this specific part in CmdStan doc. The doc is partially correct, but misses to mention that the mentioned MAP is in the unconstrained space. The Laplace sampling part does mention
The
laplace
method produces a sample from a normal approximation centered at the mode of a distribution in the unconstrained space. If the mode is a maximum a posteriori (MAP) estimate, the samples provide an estimate of the mean and standard deviation of the posterior distribution. If the mode is a maximum likelihood estimate (MLE), the sample provides an estimate of the standard error of the likelihood. In general, the posterior mode in the unconstrained space doesn’t correspond to the mean (nor mode) in the constrained space, and thus the sample is needed to infer the mean as well as the standard deviation. (See this case study for a visual illustration.)
I’ll add to my TODO list to clarify the optimization section.