Jacobian on fandamental examples

I am a Stan user for two or three years, but I am not clear how to state model block properly using Jacobian. So, I prepare several basic examples to code a model block with Jacobian.

Let (\Omega, \mathcal{A}) be a mesurable space throughout this post.

For frequentist, the following example is maybe sufficient.

Question.1 ~ transformation of data ~

Let X:\Omega \to [-\pi/2,\pi/2] be a random variable from an unknow distribution.
Consider the model defined by

\tan(X) \sim \text{Norml}(\mu, \sigma^2),

whose priors are

\mu \sim \text{Uniform}(-\infty ,\infty),
\sigma \sim \text{Uniform}(0,\infty).

Then the question is “Code this model with Jacobian”.

Answer.1

Let f(y|\mu, \sigma) be the heat kernel.
Because

f(y|\mu, \sigma)dy=f(\tan(x)|\mu, \sigma)dy=f(\tan(x)|\mu, \sigma)\frac{d(\tan(x))}{dx}dx=f(\tan(x)|\mu, \sigma)\frac{1}{cos^2(x)}dx,

we obtain
\log(f(y|\mu, \sigma)) = \log f(\tan(x)|\mu, \sigma) -2 \log \cos(x).

Thus in the model block, I guess, I should add the last term -2 \log \cos(x) in the model block.

The code is shown in the last part of this post. Is it correctly coded?
I guess the sign is wrong maybe …

Question.2 Transformation on parameter space \Theta

Let X:\Omega \to \mathbb{R} be a random variable from an unknow distribution.
Consider the model,

X \sim \text{Norml}(\mu, \exp(\theta)),

whose priors are

\mu \sim \text{Uniform}(-\infty ,\infty),
\theta \sim \text{Uniform}(-\infty,\infty).

Then the question is “Dose any Jacobian adjustment is required?”. If the answer is not, then there is a further question that “Is there a transformation of paramter spaces for which any Jacobian adjustment is required?”. and if so, then please show a simple example.

Answer.2

Let \pi(\theta|x) be posterior with a model paramter \theta and a realization of X is denoted by the letter x.

Because for \pi(\theta|x) … I guess some Jacobian adjustment is required but I am not sure.

Question.3 Linear Transformation of data for precise log likelihoods

Let X:\Omega \to \mathbb{R}^d be a random variable from an unknow distribution.
Let A be an invertible, orientation-preserving (\det A >0) matrix of size d \times d.
Consider the model,

AX \sim \text{Norml}(\mu, v),

whose priors are

\mu \sim \text{Uniform}(-\infty ,\infty),
v\sim \text{Uniform}(0,\infty).

Then the question is “Dose Stan calculates likelihoods by taking account the matrix A ?”.

Explanation of this question 3

Let f(y|\mu, \sigma) be the heat kernel.
Because

f(y|\mu, \sigma)dy=f(Ax|\mu, \sigma)dy=f(Ax|\mu, \sigma)\frac{dAx}{dx}dx=f(Ax|\mu, \sigma)(\det A )dx,

we obtain
\log(f(y|\mu, \sigma)) = \log f(\tan(x)|\mu, \sigma) + \log \det A.
Or…
\log f(\tan(x)|\mu, \sigma) =\log(f(y|\mu, \sigma)) - \log \det A ?

Thus in the model block, I guess, I should add (or minus? please tell me with the reason) the last term \log \det A in the model block. But, in the past, Stan only warns if the transfromation is non-linear, so from this fact, I guess Stan takes account the Jacobian if transfomation is linear. Is my guess correct? I also gess as same as ~ statement (e.g., x ~ noraml(a,b)) the linear transformation gives a constant in log likelihoods and hence I also guess Stan drops the constrant term \log \det A. So, I am not sure what Stan dose for linear transfom and whether Stan takes account the \log \det A in log likelihoods (denoted by lp__ in stanfit objects).

The code of Answer 1

stanmodelcode <- "
data {
int<lower=0> N;
real x[N];
}

parameters {
real mu;
}

model {
target += normal_lpdf(mu | 0, 10);
target += normal_lpdf(tan(x) | mu, 1);
for(i in 1:N)target += -2*log(cos(x[i]));
}
"

x <- runif(20, min = -3.14/2, max = 3.14/2)
dat <- list(N = 20, x = x);
f <- stan(model_code = stanmodelcode, model_name = “example”,
data = dat, iter = 1111, chains = 1)
rstan::check_hmc_diagnostics(f)

If my guess is correct, then also please let me know. Sorry for bothering,

I would appreciate it if you could reply

Question 1.

cos(x) < 0 for some x, log(x) becomes undefined. The Stan manual recommends to take the absolute value. However I’d suggest to use - log(square(cos(x))).

Question 2.
There is not Jacobian adjustment needed. Jacobian adjustments are needed, when you specify a non-linear transformation at the LHS of a sampling statement.

Question 3.

Question: Is Jacobian adjustment needed, if A and X are matrices and both are parameters?
Maybe @Bob_Carpenter, @martinmodrak can help here. (And all the other I did not add)

Additional, about the code:

real x[N] must be part of parameters

1 Like

Thank you for your reply.

About Question 1.

In Q1,
I defined the range of the random by [-\pi/2,\pi/2], So, I guess taking the absolute value is redundant.
I am not sure which code is correct in the following.
Plus

model {
               target += normal_lpdf(mu | 0, 10);
               target += normal_lpdf(tan(x)  | mu, 1);
  for(i in 1:N)target += -2*log(cos(x[i]));
}

Minus

model {
               target += normal_lpdf(mu | 0, 10);
               target += normal_lpdf(tan(x)  | mu, 1);
  for(i in 1:N)target += + 2*log(cos(x[i]));
}

About Question 2.

In Q2,
If I modified the model as following, then Is the Jacobian adjustment required?

Let X: \Omega \to \mathbb{R} be a random variable from an unknown distribution.
Consider the model.
X \sim \text{Normal}(\mu, \exp(\theta)),
\exp(\theta) \sim \text{Exp}(\lambda),
\lambda \sim \text{Uniform}(0,\infty),
\mu \sim \text{Uniform}(-\infty,\infty).

Calculation of coordinate transformation

Let f(x| \mu, \sigma^2) be the density function of Gaussian.
Let g(y|\lambda) be the density function of exponential distribution.

Then we consider the differential two-form with the basis dx \wedge dy

\int_0^\infty d\lambda \int^\infty_{-\infty} d\mu f(x|\mu,\sigma)g(y|\lambda)dx \wedge dy
=\int_0^\infty d\lambda \int^\infty_{-\infty} d\mu f(x|\mu,y)g(y|\lambda) dx \wedge dy
=\int_0^\infty d\lambda \int^\infty_{-\infty} d\mu f(x|\mu,\exp(\theta))g(\exp(\theta)|\lambda) dx \wedge dy
=\int_0^\infty d\lambda \int^\infty_{-\infty} d\mu f(x|\mu,\exp(\theta))g(\exp(\theta)|\lambda) \biggl| \frac{\partial (x,y)}{\partial (x,\theta)}\biggl|dx \wedge d\theta
=\int_0^\infty d\lambda \int^\infty_{-\infty} d\mu f(x|\mu,\exp(\theta))g(\exp(\theta)|\lambda) \biggl| \frac{\partial (x,y)}{\partial (x,\theta)}\biggl|dx \wedge d\theta

=\int_0^\infty d\lambda \int^\infty_{-\infty} d\mu f(x|\mu,\exp(\theta))g(\exp(\theta)|\lambda) \biggl| \frac{dy}{d\theta} \biggl|dx \wedge d\theta

=\int_0^\infty d\lambda \int^\infty_{-\infty} d\mu f(x|\mu,\exp(\theta))g(\exp(\theta)|\lambda) \biggl| \frac{d \exp(\theta)}{d\theta} \biggl|dx \wedge d\theta

=\int_0^\infty d\lambda \int^\infty_{-\infty} d\mu f(x|\mu,\exp(\theta))g(\exp(\theta)|\lambda) \exp(\theta) dx \wedge d\theta

=\int_0^\infty d\lambda \int^\infty_{-\infty} d\mu f(x|\mu,y)g(y|\lambda) ydx \wedge d\theta
Hence,

f(x|\mu,\sigma)g(y|\lambda)dx \wedge dy = f(x|\mu,y)g(y|\lambda) ydx \wedge d\theta,

and thus,

\log( f(x|\mu,\sigma)g(y|\lambda) ) \to \log((x|\mu,y)g(y|\lambda) ) + \log(y) .

Therefore, Should I add \log(y) in the model block?

Is this Jacobian adjustment correct?

About Question 3.

Precise log likelihoods is necessary for calculation of WAIC. But until now, I do not know what Stan dose in the case of linear transfomation whether Stan drops the determinant of the linear transfomation or add it into log likelihood.

If you are trying to correct for using \exp(\theta) on the left hand side, this seems to be identical to the constraint transform for lower-bounded scalars 10.2 Lower Bounded Scalar | Stan Reference Manual so I think you need to add \log(\exp(\theta)) = \theta … Which might be the same as you suggest, but please double check me…

Overall, I think the Jacobian adjustments can be handled completely locally, so you IMHO shouldn’t need to bring in the normal density and just looking at what happens at the \exp(\theta) \sim \text{Exp}(\lambda)

A good rule of thumb is that Stan doesn’t do anything magical. The only Jacobian adjustment Stan does automatically is for constrained parameters. For everything else you need to add the Jacobian yourself.

Does that answer your questions?

2 Likes

This reminds me on WINBUGS coding. In Stan you’d just code

X ~ normal(mu, theta);
theta ~ exponential(lambda);

Jacobian adjustments are “rarely” used. I’d suggest to post what you have in mind, otherwise we
are discussing options that may not be needed.

2 Likes

Until today, I have no explicit strategy to calculate the Jacobian. So, I came up the explicit formula to calculates the Jacobian.

For example, Let x \in \mathbb{R}^n be data distributed by a likelihood f(x|\theta) with parameter \theta \in \mathbb{R}^m(i.e., \int f(x|\theta)dx=1(\forall \theta)). Furthermore assume that \alpha \in \mathbb{R}^l denote a hyper parameter for \theta.

If the model is given by f(x|\theta)\pi(\theta|\alpha)\pi(\alpha)dx \wedge d\theta \wedge d \alpha, then the Jacobian adjustment is not needed, where \int \pi(\theta|\alpha) d\theta = 1(\forall \alpha) and \int \pi(\alpha)d\alpha=1.

Next, I consider the case that Jacobitan adjustment is needed,
Let \phi,\Phi: \mathbb{R}^m \to \mathbb{R}^m and \psi,\Psi: \mathbb{R}^l \to \mathbb{R}^l and \xi:\mathbb{R}^n \to \mathbb{R}^n be diffeomorphisms. If the model is given by

f\big(\xi(x)| \phi(\theta)\big)\pi \big( \Phi(\theta)|\psi(\alpha) \big)\pi(\Psi(\alpha))dx \wedge d\theta \wedge d \alpha,

then in this case, setting x' := \xi(x), \theta' := \Phi(\theta), \alpha':=\Psi(\alpha), we get

f\big(x'| \phi( \Phi^{-1}\theta')\big)\pi \big(\theta'|\psi(\Psi^{-1}\alpha') \big)\pi(\alpha') \bigg| \frac{ \partial(x, \theta, \alpha)}{\partial(x', \theta', \alpha')}\bigg|dx' \wedge d\theta' \wedge d \alpha'.

Thus, as the Jacobian adjustment, we add the logarithm of the term \bigg| \frac{ \partial(x, \theta, \alpha)}{\partial(x', \theta', \alpha')}\bigg| in the model block. I recognize this formula from the following your comments. I really appreciate it.

But, I guess, in linear transformations in the Left Hand Side of the sampling statement, Jacobian adjustment is always needed, except the case that the determinant is one, because the following comments. Or I am wrong? To tell the truth, I do not understand the term constrained parameters.

Anyway, thank you. I can realize Jacobian in Bayesian context.

I meant parameters that are defined with constraints in Stan, so that they are not allowed to span the complete real line (or \mathbb{R}^n more generally), i.e. lower and upper bounds (e.g., real<lower=0> sigma;) but also correlation matrices, simplex parameters etc.

Not necessarily - if all we care about is the posterior of the parameters (and NOT the precise log-likelihood), we only need to compute the target up to a constant. If the transformation is linear with a fixed transformation matrix A then the Jacobian (and log Jacobian) is not necessarily one, but is constant and can be ignored for inference.

Best of luck with your model!

1 Like