Non centered parameterization on variance parameter

sakrejda · May 23, 2017, 7:41pm

Yeah, even with multiple chains the metrics seem not to do a great job. I
wonder if energy does (haven’t looked) but it is a pretty obvious pattern
on the trace plots and I would always check for it if I saw heavy tails

sakrejda · May 23, 2017, 7:52pm

isn’t it the derivative of the inverse? I always have to check to remember…

betanalpha · May 23, 2017, 8:05pm

The energy diagnostic will identify heavy tails but as far as I can tell dynamic HMC handles them just fine. Stan can nail down the 1% and 99% quantiles of a 100-dimensional Cauchy in just a few minutes.

sakrejda · May 23, 2017, 8:11pm

Huh. What do you think is happening in this model then?

aaronjg · May 25, 2017, 1:43am

As per Michael’s comment, the jacobian is not needed. Even though the likelihood functions appear different, it seems that you are actually calculating

and so even though p(hp_alt) is different from p(hp), the integrals end up the same because in one case you integrate with respect to sigma, and the other with respect to c, so when you take into account the Jacobian, the math is identical.

As far as the difference between the STAN output, I am still not sure what is going on, but I think it does relate to the uninformative prior. When I put a gamma prior on hp/hp_alt, both formulations give the same results.

sakrejda · May 25, 2017, 12:10pm

Michael was commenting on something different than me and Bob. K

sakrejda · May 25, 2017, 6:15pm

D’oh, I just had to think this through for a model. Jus of course you’re right! :P

JulianK · May 31, 2017, 11:35am

Glad to see that my toy problem is still causing so much consternation (!).

Thanks for looking into it :)

ryan-richt · October 4, 2018, 8:44pm

Hey @Bob_Carpenter! I know this is old but since it’s preserved for posterity…
Does this example (target += theta3;) need a Jacobian adjustment at all ? The user manual says:

Whenever a nonlinear transform is applied to a parameter, such as the logarithm function being applied to beta here, and then used on the left-hand side of a sampling statement or on the left of a vertical bar in a log pdf function, an adjustment must be made...

Maybe there is one more addition that should go in the manual if the adjustment is necessary for consumption of transformed parameters that never appear in either of these positions?

Bob_Carpenter · October 8, 2018, 9:43pm

For the lognormal, you need the Jacobian adjustment.

It’s a little more subtle than written in the manual given that Stan can be used in lots of different ways. For example, normal(a | b, c) is equivalent computationally and mathematically to normal(b | a, c), but they’re conceptually different.

The point of Jacobians is that if you have a distribution p_X(x) and a smooth, monotonic transform f:\mathbb{R} \rightarrow \mathbb{R} and you want to know the distribution of Y = f(X), then you have:

p_Y(y) = p_X(f^{-1}(y)) \cdot \left| \frac{\partial}{\partial y} \, f^{-1}(y) \right|.

In Stan, this means that if you declare

parameters {
  real y;
  ...
transformed parameters {
  real x = f_inv(y);
  ...
model {
  target += foo(x | theta); 
  target += log(abs(d_f_inv(y));  // d_f_inv(y) = f_inv'(y)

The Jacobian’s necessary here because the distribution’s being defined for x with x ~ foo(theta), but we want to get the right distirbution on y, which requires the Jacobian adjustment. The target += foo(x | theta) gets you the \log p_X(f^{-1}(y)) term, and the target += .. gets the \log \left| \frac{\partial}{\partial y} \, f^{-1}(y) \right| term (everything’s on the log scale in Stan). So now you’ve defined the right log density for y, namely

\log p_Y(y) = \log p_X(f^{-1}(y)) + \log \, \left| \frac{\partial}{\partial y} \, f^{-1}(y) \right|.

ryan-richt · October 10, 2018, 4:42pm

Thank you @Bob_Carpenter!

I guess I have scoured the forums and docs and wanted to have a google-able resource for posterity that is this different (?) case. Every complete case I can find in the forums/docs is, as you explained above, a function of a single parameter, used on the LHS. This post from @sakrejda above (that reponded to a year ago) is interesting because he included a Jacobian correction for a parameter used only on the RHS. If your last post applied to any parameter transform, even if used only on the RHS, then why didn’t we also need to add log(abs(theta2)) as well, to get the correct distribution on sigma2_unit from @sakrejda’s same code snippet?

Thank you again!

Bob_Carpenter · October 22, 2018, 2:16am

It’s a matter of what density you’re trying to compute, not where things show up in the formula—Stan lets lots of things be written different ways.

When you need a Jacobian is when you have a parameter theta and a transformed parameter alpha = f(theta) and we want to define a density on alpha and have it propagate to the proper density on theta. I tried to explain that as clearly as I could in the previous post by writing out the long form densities p_Y(y) for the random variable Y and showing how that can be defined in terms of p_X(x) given a mapping X \mapsto Y.

Topic		Replies	Views
Non-centered parameterization with gamma priors for variance Modeling	6	1771	September 30, 2017
Non centered parametrization of lognormal Distribution Modeling specification	4	1599	March 13, 2021
Non-Centered parameterizations Modeling	4	480	April 8, 2021
Non Centered Parameterization in GP Model Modeling	18	2164	June 20, 2017
Partial non-centered parametrizations in Stan Modeling techniques	7	2701	December 30, 2018

Non centered parameterization on variance parameter

Related topics