Non centered parameterization on variance parameter

Yeah, even with multiple chains the metrics seem not to do a great job. I
wonder if energy does (haven’t looked) but it is a pretty obvious pattern
on the trace plots and I would always check for it if I saw heavy tails

isn’t it the derivative of the inverse? I always have to check to remember…

The energy diagnostic will identify heavy tails but as far as I can tell dynamic HMC handles them just fine. Stan can nail down the 1% and 99% quantiles of a 100-dimensional Cauchy in just a few minutes.

Huh. What do you think is happening in this model then?

1 Like

As per Michael’s comment, the jacobian is not needed. Even though the likelihood functions appear different, it seems that you are actually calculating

and so even though p(hp_alt) is different from p(hp), the integrals end up the same because in one case you integrate with respect to sigma, and the other with respect to c, so when you take into account the Jacobian, the math is identical.

As far as the difference between the STAN output, I am still not sure what is going on, but I think it does relate to the uninformative prior. When I put a gamma prior on hp/hp_alt, both formulations give the same results.

Michael was commenting on something different than me and Bob. K

D’oh, I just had to think this through for a model. Jus of course you’re right! :P

Glad to see that my toy problem is still causing so much consternation (!).

Thanks for looking into it :)

Hey @Bob_Carpenter! I know this is old but since it’s preserved for posterity…
Does this example (target += theta3;) need a Jacobian adjustment at all ? The user manual says:

Whenever a nonlinear transform is applied to a parameter, such as the logarithm function being applied to beta here, and then used on the left-hand side of a sampling statement or on the left of a vertical bar in a log pdf function, an adjustment must be made...

Maybe there is one more addition that should go in the manual if the adjustment is necessary for consumption of transformed parameters that never appear in either of these positions?

For the lognormal, you need the Jacobian adjustment.

It’s a little more subtle than written in the manual given that Stan can be used in lots of different ways. For example, normal(a | b, c) is equivalent computationally and mathematically to normal(b | a, c), but they’re conceptually different.

The point of Jacobians is that if you have a distribution p_X(x) and a smooth, monotonic transform f:\mathbb{R} \rightarrow \mathbb{R} and you want to know the distribution of Y = f(X), then you have:

p_Y(y) = p_X(f^{-1}(y)) \cdot \left| \frac{\partial}{\partial y} \, f^{-1}(y) \right|.

In Stan, this means that if you declare

parameters {
  real y;
  ...
transformed parameters {
  real x = f_inv(y);
  ...
model {
  target += foo(x | theta); 
  target += log(abs(d_f_inv(y));  // d_f_inv(y) = f_inv'(y)

The Jacobian’s necessary here because the distribution’s being defined for x with x ~ foo(theta), but we want to get the right distirbution on y, which requires the Jacobian adjustment. The target += foo(x | theta) gets you the \log p_X(f^{-1}(y)) term, and the target += .. gets the \log \left| \frac{\partial}{\partial y} \, f^{-1}(y) \right| term (everything’s on the log scale in Stan). So now you’ve defined the right log density for y, namely

\log p_Y(y) = \log p_X(f^{-1}(y)) + \log \, \left| \frac{\partial}{\partial y} \, f^{-1}(y) \right|.
1 Like

Thank you @Bob_Carpenter!

I guess I have scoured the forums and docs and wanted to have a google-able resource for posterity that is this different (?) case. Every complete case I can find in the forums/docs is, as you explained above, a function of a single parameter, used on the LHS. This post from @sakrejda above (that reponded to a year ago) is interesting because he included a Jacobian correction for a parameter used only on the RHS. If your last post applied to any parameter transform, even if used only on the RHS, then why didn’t we also need to add log(abs(theta2)) as well, to get the correct distribution on sigma2_unit from @sakrejda’s same code snippet?

Thank you again!

It’s a matter of what density you’re trying to compute, not where things show up in the formula—Stan lets lots of things be written different ways.

When you need a Jacobian is when you have a parameter theta and a transformed parameter alpha = f(theta) and we want to define a density on alpha and have it propagate to the proper density on theta. I tried to explain that as clearly as I could in the previous post by writing out the long form densities p_Y(y) for the random variable Y and showing how that can be defined in terms of p_X(x) given a mapping X \mapsto Y.