# Jacobian determinent

#1

Hello!
I have a trouble calculating the log Jacobian determinant of my transformed parameter erro_y in my stan model, which is attached bellow.

erro_y=(y-beta0+beta1x)/(1+beta1beta1); where x and y are data and beta0 and beta1 are parameters.

Best wishes!

Yi Zheng

RMA.origin.stan (351 Bytes)

#2

In your case, you are putting a Gaussian prior on erro_y, which I will denote \epsilon=g(y), with g(\cdot) being the transformation. The change of variables formula says that we can find the equivalent distribution over y as p(y)=|J_{g}|p_{\epsilon}(g(y))

\epsilon=g(y)=\frac{y-\beta_0+\beta_1x}{1+\beta_1^2}\Rightarrow J_g=\frac{1}{1+\beta_1^2}

As this is a constant you can safely ignore it as Stanâs target only needs to be accurate up to a constant.

For the record, if you put priors on your betas instead, you wonât need the Jacobian term in general.

Linear model with measurement error in X and Y
#3

Thank you very much!
I suppose erro_y is a function of beta0 and beta1. in my case, Ďľ=g(beta0, beta1). because y is just a data set, while beta0 and beta1 are parameters. If I misunderstood, please let me know.

Actually, I was trying to deal with Major Axis Regression (MAR) in Stan. If you have another way to solve MAR, please teach me. I am very grateful!

#4

If you want to model this in stan, then y is a random variable on equal footing with your other variables and you should define a probability distribution over it (the likelihood), usually conditioned on the remaining variables. I imagine your full joint distribution looks like

p(y,\beta_0,\beta_1|x)=p(y|\beta_0,\beta_1,x)p(\beta_0)p(\beta_1)

so the above transformation was more correctly for the conditional distribution p(y|\beta_0,\beta_1,x) and so does not require a Jacobian term adjustment for the betas.

I am not quite familiar with the model you are pursuing, but I think I get the gist.
In short, your original model specification is sufficient, but you might want to add priors to the betas. I think you can safely use priors designed for general regression problems.

beta0 ~ normal(0,1);
beta1 ~ normal(0,1);
erro_y~normal(0,sigy);


p.s. you can use $âŚ$ for latex typesetting and âŚ for code. You can also post blocks of code using âŚ

#5

Thanks a lot!

âbeta0 ~ normal(0,1);
beta1 ~ normal(0,1);â

The code above set priors, and

âerro_y~normal(0,sigy);â

defines likelihood.

if I set non-informative priors or weak informative priors ( half cauchy with scale > 10 OR half normal with scale>10) for beta0 and beta1, there are many divergent transitions. OR the two chains have extremely different performances (no warnings donât mean OK, run âtraceplotâ).

The R script can generate random data.

MAR.R (1.2 KB)
lm.stan (393 Bytes)
RMA.origin.stan (1.0 KB)

#6

Generally, wide priors like those you describe are discouraged as they do not encourage posterior concentration and rarely reflect even conservative prior beliefs - i.e. your Cauchy includes \beta=127 within its 95% probability interval, which seems a tad large for most use cases. In the uninformative case you fundamentally donât care whether the weight is 2 or 1^9, which also seems unrealistic.

I have looked over your code, and there doesnât seem to be any particular places where the code could be improved, so I think your divergences must be down to poor posterior concentration and the lack of informative priors - you will need to use stronger priors of some kind. Alternatively, if you really insist, you might be able to fiddle around with the NUTS sampler parameters - if you set the stepsize or adapt_delta low enough I imagine it will perform slightly better.

#7

Thanks a lot !

The reason why I prefer uninformative priors is that I hope the posteriors are based only on likelihood, which makes the bayesian model comparable to that of the frequentest.
Itâs acceptable to me to set slightly stronger priors. such as Normal ( 0, 5 ).

I raise the topic here is because there is a warning when I run the model:

" Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.


If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
erro_y ~ normal(âŚ)"

So I hope I can solve this kind of problems once for all, not just this case alone.

#8

If you want to compute frequentist stats, use frequentist stats.

What youâre asking is impossible in the Bayesain context.

The problem is that youâre reducing two degrees of freedo (beta0 and beta1) to one (erro_y). Usually we use * for multiplication on mailing lists. Or you can use LaTeX escapes and write \beta^2.

#9

you should be able to consider it a transformation conditional on either \beta_0 or \beta_1, compute the jacobian with respect to the quantity not conditioned on, and then putting a regular prior on the other.

#10

I think weâre on the same page. Letting \gamma = f(\alpha, \beta), you can transform (\alpha, \beta) to (\gamma, \beta) and put a prior on both \gamma and \beta. I think the Jacobian reduces to just \log \left| \frac{\partial}{\partial \alpha} f(\alpha, \beta) \right|. Or you can do it the other way around and transform to \gamma, \alpha.