Jacobian determinent


#1

Hello!
I have a trouble calculating the log Jacobian determinant of my transformed parameter erro_y in my stan model, which is attached bellow.

erro_y=(y-beta0+beta1x)/(1+beta1beta1); where x and y are data and beta0 and beta1 are parameters.

Hope for your help!

Best wishes!

Yi Zheng

RMA.origin.stan (351 Bytes)


#2

In your case, you are putting a Gaussian prior on erro_y, which I will denote \epsilon=g(y), with g(\cdot) being the transformation. The change of variables formula says that we can find the equivalent distribution over y as p(y)=|J_{g}|p_{\epsilon}(g(y))

\epsilon=g(y)=\frac{y-\beta_0+\beta_1x}{1+\beta_1^2}\Rightarrow J_g=\frac{1}{1+\beta_1^2}

As this is a constant you can safely ignore it as Stan’s target only needs to be accurate up to a constant.

For the record, if you put priors on your betas instead, you won’t need the Jacobian term in general.


Linear model with measurement error in X and Y
#3

Thank you very much!
I suppose erro_y is a function of beta0 and beta1. in my case, ϵ=g(beta0, beta1). because y is just a data set, while beta0 and beta1 are parameters. If I misunderstood, please let me know.

Actually, I was trying to deal with Major Axis Regression (MAR) in Stan. If you have another way to solve MAR, please teach me. I am very grateful!

Thank you again for your reply!


#4

If you want to model this in stan, then y is a random variable on equal footing with your other variables and you should define a probability distribution over it (the likelihood), usually conditioned on the remaining variables. I imagine your full joint distribution looks like

p(y,\beta_0,\beta_1|x)=p(y|\beta_0,\beta_1,x)p(\beta_0)p(\beta_1)

so the above transformation was more correctly for the conditional distribution p(y|\beta_0,\beta_1,x) and so does not require a Jacobian term adjustment for the betas.

I am not quite familiar with the model you are pursuing, but I think I get the gist.
In short, your original model specification is sufficient, but you might want to add priors to the betas. I think you can safely use priors designed for general regression problems.

beta0 ~ normal(0,1);
beta1 ~ normal(0,1);
erro_y~normal(0,sigy);

p.s. you can use $…$ for latex typesetting and `…` for code. You can also post blocks of code using ```…```


#5

Thanks a lot!

‘beta0 ~ normal(0,1);
beta1 ~ normal(0,1);’

The code above set priors, and

‘erro_y~normal(0,sigy);’

defines likelihood.

if I set non-informative priors or weak informative priors ( half cauchy with scale > 10 OR half normal with scale>10) for beta0 and beta1, there are many divergent transitions. OR the two chains have extremely different performances (no warnings don’t mean OK, run “traceplot”).

The R script can generate random data.

MAR.R (1.2 KB)
lm.stan (393 Bytes)
RMA.origin.stan (1.0 KB)


#6

Generally, wide priors like those you describe are discouraged as they do not encourage posterior concentration and rarely reflect even conservative prior beliefs - i.e. your Cauchy includes \beta=127 within its 95% probability interval, which seems a tad large for most use cases. In the uninformative case you fundamentally don’t care whether the weight is 2 or 1^9, which also seems unrealistic.

I have looked over your code, and there doesn’t seem to be any particular places where the code could be improved, so I think your divergences must be down to poor posterior concentration and the lack of informative priors - you will need to use stronger priors of some kind. Alternatively, if you really insist, you might be able to fiddle around with the NUTS sampler parameters - if you set the stepsize or adapt_delta low enough I imagine it will perform slightly better.


#7

Thanks a lot !

The reason why I prefer uninformative priors is that I hope the posteriors are based only on likelihood, which makes the bayesian model comparable to that of the frequentest.
It’s acceptable to me to set slightly stronger priors. such as Normal ( 0, 5 ).

I raise the topic here is because there is a warning when I run the model:

" Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.

If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
erro_y ~ normal(…)"

So I hope I can solve this kind of problems once for all, not just this case alone.

Thank you again for your patient reply!


#8

If you want to compute frequentist stats, use frequentist stats.

What you’re asking is impossible in the Bayesain context.

The problem is that you’re reducing two degrees of freedo (beta0 and beta1) to one (erro_y). Usually we use * for multiplication on mailing lists. Or you can use LaTeX escapes and write \beta^2.


#9

you should be able to consider it a transformation conditional on either \beta_0 or \beta_1, compute the jacobian with respect to the quantity not conditioned on, and then putting a regular prior on the other.


#10

I think we’re on the same page. Letting \gamma = f(\alpha, \beta), you can transform (\alpha, \beta) to (\gamma, \beta) and put a prior on both \gamma and \beta. I think the Jacobian reduces to just \log \left| \frac{\partial}{\partial \alpha} f(\alpha, \beta) \right|. Or you can do it the other way around and transform to \gamma, \alpha.