Understanding Jacobian adjustments (Stan user guide) & multiple transformations of one parameter

When reading the Stan user guide on Jacobian adjustments I couldn’t find answers to a couple of questions.
In the Stan user guide (21.4 Vectors with varying bounds | Stan User’s Guide) is the following example:

data {
  int N;
  vector[N] L;  // lower bounds
  ...
parameters {
  vector[N] alpha_raw;
  ...
transformed parameters {
  vector[N] alpha = L + exp(alpha_raw);
  ...
model {
  target += sum(alpha_raw);  // log Jacobian
  ...

If the model block contained a line like alpha ~ foo(...); then the adjustment of the Jacobian surely is necessary.
However, as far as I understand, the adjustment of the Jacobian would not be necessary if the transformed parameter alpha was not given a distribution (e.g., occurred only on the right of a ~). The question is would the adjustment target += sum(alpha_raw); make the specification of the model wrong (change the model in an unintended way)?


Based on this I have a follow up question: Say, we’d write the following stan program

data {
 ...
}
parameter {
    real a;
}
model {
    f(a) ~ foo(...);
    target +=  ...;   // log Jacobian of f
    g(a) ~ bar(...);
    target +=  ...;   // log Jacobian of g
}

I am not sure if this is nonsensical to begin with or if it might have an actual use case, but I struggle to understand would this meant mathematically. If we transform the parameter a only once then the transformed a should be an argument of the target density. However, if we transformed it twice then both transformed variables (of the same parameter) should be arguments of the target density, which doesn’t make sense to me.

1 Like

If I understand your first question correctly, you are asking whether it is wrong to include the Jacobian adjustment in a model that otherwise should not require it. Yes, it is wrong.

Your second question is more subtle. In general, when we sample a nonlinear transform of a parameter like f(a) ~ distribution, the goal of the Jacobian adjustment is to ensure that this statement actually corresponds to what it appears to say, namely that distribution will be the sampling distribution for f(a). The reason your question is subtle is because even without any nonlinear transformations, if we write

a ~ distribution1;
a ~ distribution2;

Then it is no longer the case that we can expect the sampling distribution for a to be either distribution1 or distribution2. So if we put distributions on two nonlinear transforms of a what are we trying to achieve with our pair of Jacobian adjustments? The ordinary goal of Jacobian adjustments is already impossible–we’re never going to have a sampling distribution for f(a) or g(a) that corresponds to either of the distributions foo or bar in your code, because the two distributions interfere with one another.

Let’s rewrite your second model using the target += notation, because a few things become more clear

target += foo_lpdf(f(a) | ...);         // line 1
target += jacobian1;                      // line 2
target += bar_lpdf(g(a) | ...);       // line 3
target += jacobian2;                     // line 4

This program, with our without lines 2 & 4, corresponds to some log-posterior function that Stan (possibly) can sample. But unless you really know what you’re doing and have a good reason for writing it this way, it is very unlikely to be a log-posterior that you intend. Again, this is true even if there are no nonlinear transformations and no Jacobian adjustments:

target += foo_lpdf(a | ...) + bar_lpdf(a | ...); 

The log-probability density function defined above probably doesn’t correspond to a distribution that you know or have much intuition about. Best to leave this sort of stuff alone unless you have a really good reason not to.

2 Likes

I agree 100% with the above. Putting two distributions on a parameter almost always leads to sampling issues unless it’s a mixture of some kind.

1 Like

Thanks, that makes perfectly sense!


In contrast to

    f(a) ~ foo(...);
    g(a) ~ bar(...);

I could still see that someone might do something like

target += foo_lpdf(a | ...) + bar_lpdf(a | ...); 

E.g., if the intended (unnormalized) density on a can be written as a product of foo and bar, that’s why I consciously choose f(a) and g(a) on the right hand sides of ~. Only mentioning my motivation so it might help a reader in the future.