In your case, you are putting a Gaussian prior on erro_y, which I will denote \epsilon=g(y), with g(\cdot) being the transformation. The change of variables formula says that we can find the equivalent distribution over y as p(y)=|J_{g}|p_{\epsilon}(g(y))
Thank you very much!
I suppose erro_y is a function of beta0 and beta1. in my case, Ďľ=g(beta0, beta1). because y is just a data set, while beta0 and beta1 are parameters. If I misunderstood, please let me know.
Actually, I was trying to deal with Major Axis Regression (MAR) in Stan. If you have another way to solve MAR, please teach me. I am very grateful!
If you want to model this in stan, then y is a random variable on equal footing with your other variables and you should define a probability distribution over it (the likelihood), usually conditioned on the remaining variables. I imagine your full joint distribution looks like
so the above transformation was more correctly for the conditional distribution p(y|\beta_0,\beta_1,x) and so does not require a Jacobian term adjustment for the betas.
I am not quite familiar with the model you are pursuing, but I think I get the gist.
In short, your original model specification is sufficient, but you might want to add priors to the betas. I think you can safely use priors designed for general regression problems.
if I set non-informative priors or weak informative priors ( half cauchy with scale > 10 OR half normal with scale>10) for beta0 and beta1, there are many divergent transitions. OR the two chains have extremely different performances (no warnings donât mean OK, run âtraceplotâ).
Generally, wide priors like those you describe are discouraged as they do not encourage posterior concentration and rarely reflect even conservative prior beliefs - i.e. your Cauchy includes \beta=127 within its 95% probability interval, which seems a tad large for most use cases. In the uninformative case you fundamentally donât care whether the weight is 2 or 1^9, which also seems unrealistic.
I have looked over your code, and there doesnât seem to be any particular places where the code could be improved, so I think your divergences must be down to poor posterior concentration and the lack of informative priors - you will need to use stronger priors of some kind. Alternatively, if you really insist, you might be able to fiddle around with the NUTS sampler parameters - if you set the stepsize or adapt_delta low enough I imagine it will perform slightly better.
The reason why I prefer uninformative priors is that I hope the posteriors are based only on likelihood, which makes the bayesian model comparable to that of the frequentest.
Itâs acceptable to me to set slightly stronger priors. such as Normal ( 0, 5 ).
I raise the topic here is because there is a warning when I run the model:
" Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
erro_y ~ normal(âŚ)"
So I hope I can solve this kind of problems once for all, not just this case alone.
If you want to compute frequentist stats, use frequentist stats.
What youâre asking is impossible in the Bayesain context.
The problem is that youâre reducing two degrees of freedo (beta0 and beta1) to one (erro_y). Usually we use * for multiplication on mailing lists. Or you can use LaTeX escapes and write \beta^2.
you should be able to consider it a transformation conditional on either \beta_0 or \beta_1, compute the jacobian with respect to the quantity not conditioned on, and then putting a regular prior on the other.
I think weâre on the same page. Letting \gamma = f(\alpha, \beta), you can transform (\alpha, \beta) to (\gamma, \beta) and put a prior on both \gamma and \beta. I think the Jacobian reduces to just \log \left| \frac{\partial}{\partial \alpha} f(\alpha, \beta) \right|. Or you can do it the other way around and transform to \gamma, \alpha.