Good - one needs a correction, but one question has not been answered.

By moving the transformation to the right hand side (in `von_mises_prior_lpdf()`

), does the problem go away ?

# Von_mises documentation suggestion

**Andrew**#21

**betanalpha**#22

If you are confused about Jacobians then I highly recommend that you spend some time going over probability densities and their properties. Theyâre sufficiently non-intuitive that itâs worth putting time into the theory so that you donât end up trying to guess in every application.

The important property here is that you donât need Jacobians for parameters you condition on but you do for parameters in the distribution. So let pi(f | g) be a conditional probability density function but let f and g be specified as functions of other parameters, f(x) and g(y).

What is pi(f | y)? Just pi(f | g(y)).

What is pi(x | g)? Now we need the Jacobian: pi(f(x) | g) | dx/df |.

**Andrew**#23

But in the example above (immediately above, not the start of the thread), the prior acts directly on the unit vector parameter (mu_vec). It takes the raw unit vector as a parameter, not the version transformed to an angle.

It certainly does not provoke stanâs warning about needing a Jacobian. Have I simply stumbled across a way to obscure the structure, so as to hide a mistake from stan ?

**betanalpha**#24

If you are referring to

the reason that you donât get the warning is that youâve âhiddenâ the nonlinear transformation in the user-defined function so that the parser canât see it. That construction still needs a Jacobian to be correct.

**Andrew**#25

That construction still needs a Jacobian to be correct

Thanks. That is what I wanted spelled out.

When I look at the derivative, I think I have to do something like

`target += -log(1 + x^2);`

and

`target += -log(1 + y^2);`

for the two vector components and remove the helper function, but I better check my signs and experiment a bit. I want it to be correct, but I am struggling to find a good numerical test - something that goes wrong when I have the wrong derivatives.

It is embarrassing that what I wrote in the earlier post seems to be patently wrong.

Thanks for the comments.

**betanalpha**#26

Probability is really hard. No shame in struggling provided that one respects the math and doesnât try to hack something together.

You want to look at MCMC output to verity that youâre quantifying the right distribution, so it helps to have analytic expectations (means, medians, variances, quantiles, etc) for comparison. Keeping in mind that expectations on circles are weird (ambiguity in defining the mean, hence the variance, etc).

**Bob_Carpenter**#27

Simulation is the best way to make sure youâre doing the right thing. Youâll see that things wonât get the right answer if you get the Jacobians wrongâyouâll be sampling from the wrong distribution.

**Andrew**#28

No, here I do not really agree. Obviously I work with simulated data.

I have two extremes in order to highlight problems. First, I have plenty of data - a few thousand points and second, I generate a small amount of data - 50 points.

The problem is that the Jacobian is necessary in the prior, but the prior is so unimportant.

- If you have no prior, stan finds the centre and kappa.
- If one puts in a weak prior (mu~von_mises(x, y)) and no Jacobian, stan finds the centre and kappa with a difference in the second digit after the decimal point.
- If I put in what I think is correct, (log of the derivative of arctan) the values also come out correct. Maybe they are.

I think I picked my words correctly before. I am struggling to find a good numeric test. If you use a unit vector, it is hard for the system to go wrong. The dominant derivatives will always send mu towards the middle of the data.

I would really like to get this correct in principle, not just correct in the sense of the results look OK.

**sakrejda**#29

That means making sure you are doing the right math. So either convince yourself or find somebody else to compare to. The gateway drug here is the log-normal density where `y`

is data and the model is.

```
log(y) ~ normal(mu, sigma)
```

At least one of Bobâs case studies goes through this in detail which I found helpful at the beginning. Another fun example to do is sampling from a donut defined by angle/radius (easy) and then transforming so that the sampler is actually moving in cartesian coordinates but sampling from the same density. Maybe bivariate log-normal would be a better second step but itâs less fun.

If you want to try it on your density I would skip the function business the first time around so you donât add an extra layer of confusion.

**betanalpha**#30

@Bob_Carpenter is referring to sampling from the *prior* alone to verify that you are getting the correct distribution. You should not be looking at data until you can verify that the prior is implemented correctly.

**Andrew**#31

Yes indeed, thanks. I did eventually realise I should fix the prior by itself.

May I post the code with the Jacobian that seems to work and return the correct value for mu ?

My jacobian looks like this

```
data {
// vector [NP] angles; // where one would have real data
real mu_prior; // fed in from test script
real kappa_fixed; // also given by test script
}
parameters {
unit_vector [2] mu_vec;
}
transformed parameters {
real mu = atan2(mu_vec[2], mu_vec[1]);
}
model {
real t1;
real t2;
mu ~ von_mises (mu_prior, kappa_fixed);
// angles ~ von_mises (mu, kappa); // where one would use mu
t1 = 1 + (mu_vec[2] * mu_vec[2]);
target += - log (t1);
t2 = 1 + (mu_vec[1] * mu_vec[1]);
target += - log (t2);
}
```

This started out as a suggestion for the manual for anybody who has to work with circular data. I am not sure if it is helpful or confusing. I would still offer to write a few lines, but only if wiser men give me their blessing.

**sakrejda**#33

Could you post the complete model instead of just this piece? I canât tell which `von_mises`

function youâre using.

**Andrew**#34

Of course I can post a full model, but the point here was to check the Jacobian on the prior, not on real data.

A little rstan driver to test it looks like

```
# 14 july 2017
library (rstan)
mu_prior = 4.
kappa_fixed = 30
iter=1000
chains = 4
cores = 4
seed = 1637
data = list(c("mu_prior", "kappa_fixed"))
fit_data =
stan ("von_mises_fit_j.stan", data = data, iter=iter, chains = chains, cores = cores, seed = seed)
print (fit_data)
print (stan_trace(fit_data, pars = c("mu"), inc_warmup = T))
```

and just put the previous snippet of stan into âvon_mises_fit_j.stanâ.

In the event that anyone finds this of interest, I can clean up my test with synthetic data and post a full example (generate data, fit to it).

**ecashin**#35

I find all circular models interesting, and Iâm really glad that youâre going through all this trouble well before I need to use a model like this. I am frightened by the sentiment in this discussion thread that analysis of circular data is tricky, but Iâm encouraged by the fact that the example you end up with will have gotten significant attention.

**Andrew**#36

I am not sure if the von Mises story has been over-dramatised. If you have data with one cluster, you just have to unwrap the results from multiple chains. Treating the unit vector as the main parameter is just a nice way to avoid the issue.

I saw the problems in a more complicated model and noticed that it is not good to constrain the angle - the sampler keeps banging into the wall one has built. Then I think the unit vector formulation is really useful.

I would still be grateful if someone more knowledgable looked at my `1 + (mu_vec[2] * mu_vec[2]);`

terms and gave me their blessing.

**Bob_Carpenter**#37

Thereâs a data structure `unit_vector`

that generates points on the circle.

This example wants to transform that point on a circle into an angle and put a prior on the angle. Hence the need for a Jacobian.

**Bob_Carpenter**#38

I came back and thought about this some more and now Iâm confused about what we need here (if anything) in the way of Jacobians. Our unit vector data type is unusual in that it uses an x and y to provide a single degree of freedomâdegrees around the circle.

@betanalpha Does the following program need a Jacobian in order to put a distribution `theta`

?

```
parameters {
unit_vector[2] xy;
}
transformed parameters {
real<lower=-pi(), upper=pi()> theta = atan2(xy[1], xy[2]);
}
```

From first principles, it seems like youâd need a Jacobian if you want to put a distribution on `theta`

. My inclination would be to do one for this

```
(x, y) -> (atan2(x, y), y)
```

but I always get confused when the degrees of freedom arenât easy to map. If you do this, the Jacobian is triangular, and the `y -> y`

bit is the identity, so you only need to add the following to the log density.

```
log(abs( d/d.x. atan2(x, y) ))
```

But if I do that, sampling is no longer uniform for `theta`

âit has dips at `pi/2`

and `-pi/2`

. So that leads me to believe I *donât* need a JacobianâI can just put a ditribution on the angle `theta`

directly.

And why would I map `(x, y) -> (atan2(x,y), y)`

rather than to `(atan2(x,y), x)`

when the latter causes the Jacobian to be zero? Is what Iâm missing what happens to `y`

in the first case and thatâs why things no longer look uniform?

**betanalpha**#39

Remember that you do not evaluate the Jacobian at an arbitrary inverse of the transform but rather *all* inverses of the transform. For more details, see https://en.wikipedia.org/wiki/Probability_density_function#Dependent_variables_and_change_of_variables.

The immediate issue here is that atan2 is multi-valued, so you have to sum over all of the x and y pairs that could give you the same value of theta. Which unfortunately is an infinite sum.

Itâs the same problem that arises when you naively try to transform from (x, y) \in R^{2} to (r, theta) \in R^{2} - {0} to theta \in S^{1}. You canât really have an isomorphism when the underlying topology changes, and if you try to force it then the Jacobian punishes you.

**betanalpha**#40

In the first case you get a transformation to (theta, y) case and in the latter case you get a transformation to (theta, x) space. These are two different spaces with two different joint densities (and possibly two different Jacobians defining those joint densities). Itâs only when you *marginalize* the auxiliary parameter out that you recover the same marginal density for theta.