Von_mises documentation suggestion


Good - one needs a correction, but one question has not been answered.
By moving the transformation to the right hand side (in von_mises_prior_lpdf()), does the problem go away ?


If you are confused about Jacobians then I highly recommend that you spend some time going over probability densities and their properties. They’re sufficiently non-intuitive that it’s worth putting time into the theory so that you don’t end up trying to guess in every application.

The important property here is that you don’t need Jacobians for parameters you condition on but you do for parameters in the distribution. So let pi(f | g) be a conditional probability density function but let f and g be specified as functions of other parameters, f(x) and g(y).

What is pi(f | y)? Just pi(f | g(y)).

What is pi(x | g)? Now we need the Jacobian: pi(f(x) | g) | dx/df |.


But in the example above (immediately above, not the start of the thread), the prior acts directly on the unit vector parameter (mu_vec). It takes the raw unit vector as a parameter, not the version transformed to an angle.

It certainly does not provoke stan’s warning about needing a Jacobian. Have I simply stumbled across a way to obscure the structure, so as to hide a mistake from stan ?


If you are referring to

the reason that you don’t get the warning is that you’ve “hidden” the nonlinear transformation in the user-defined function so that the parser can’t see it. That construction still needs a Jacobian to be correct.


That construction still needs a Jacobian to be correct

Thanks. That is what I wanted spelled out.
When I look at the derivative, I think I have to do something like
target += -log(1 + x^2);
target += -log(1 + y^2);

for the two vector components and remove the helper function, but I better check my signs and experiment a bit. I want it to be correct, but I am struggling to find a good numerical test - something that goes wrong when I have the wrong derivatives.

It is embarrassing that what I wrote in the earlier post seems to be patently wrong.
Thanks for the comments.


Probability is really hard. No shame in struggling provided that one respects the math and doesn’t try to hack something together.

You want to look at MCMC output to verity that you’re quantifying the right distribution, so it helps to have analytic expectations (means, medians, variances, quantiles, etc) for comparison. Keeping in mind that expectations on circles are weird (ambiguity in defining the mean, hence the variance, etc).


Simulation is the best way to make sure you’re doing the right thing. You’ll see that things won’t get the right answer if you get the Jacobians wrong—you’ll be sampling from the wrong distribution.


No, here I do not really agree. Obviously I work with simulated data.
I have two extremes in order to highlight problems. First, I have plenty of data - a few thousand points and second, I generate a small amount of data - 50 points.
The problem is that the Jacobian is necessary in the prior, but the prior is so unimportant.

  • If you have no prior, stan finds the centre and kappa.
  • If one puts in a weak prior (mu~von_mises(x, y)) and no Jacobian, stan finds the centre and kappa with a difference in the second digit after the decimal point.
  • If I put in what I think is correct, (log of the derivative of arctan) the values also come out correct. Maybe they are.

I think I picked my words correctly before. I am struggling to find a good numeric test. If you use a unit vector, it is hard for the system to go wrong. The dominant derivatives will always send mu towards the middle of the data.

I would really like to get this correct in principle, not just correct in the sense of the results look OK.


That means making sure you are doing the right math. So either convince yourself or find somebody else to compare to. The gateway drug here is the log-normal density where y is data and the model is.

log(y) ~ normal(mu, sigma)

At least one of Bob’s case studies goes through this in detail which I found helpful at the beginning. Another fun example to do is sampling from a donut defined by angle/radius (easy) and then transforming so that the sampler is actually moving in cartesian coordinates but sampling from the same density. Maybe bivariate log-normal would be a better second step but it’s less fun.

If you want to try it on your density I would skip the function business the first time around so you don’t add an extra layer of confusion.


@Bob_Carpenter is referring to sampling from the prior alone to verify that you are getting the correct distribution. You should not be looking at data until you can verify that the prior is implemented correctly.


Yes indeed, thanks. I did eventually realise I should fix the prior by itself.
May I post the code with the Jacobian that seems to work and return the correct value for mu ?
My jacobian looks like this

data {
//  vector [NP] angles; // where one would have real data
    real mu_prior;      // fed in from test script
    real kappa_fixed;   // also given by test script

parameters {
    unit_vector [2] mu_vec;

transformed parameters {
    real mu = atan2(mu_vec[2], mu_vec[1]);

model {
    real t1;
    real t2;
    mu ~ von_mises (mu_prior, kappa_fixed); 
//  angles ~ von_mises (mu, kappa);  // where one would use mu
    t1 = 1 + (mu_vec[2] * mu_vec[2]);
    target += - log (t1);
    t2 = 1 + (mu_vec[1] * mu_vec[1]);
    target += - log (t2);

This started out as a suggestion for the manual for anybody who has to work with circular data. I am not sure if it is helpful or confusing. I would still offer to write a few lines, but only if wiser men give me their blessing.


Careful with the androcentrism there : )


Could you post the complete model instead of just this piece? I can’t tell which von_mises function you’re using.


Of course I can post a full model, but the point here was to check the Jacobian on the prior, not on real data.
A little rstan driver to test it looks like

# 14 july 2017
library (rstan)
mu_prior = 4.
kappa_fixed = 30

chains = 4
cores = 4
seed = 1637

data = list(c("mu_prior", "kappa_fixed"))
fit_data =
    stan ("von_mises_fit_j.stan", data = data, iter=iter, chains = chains, cores = cores, seed = seed)

print (fit_data)
print (stan_trace(fit_data, pars = c("mu"), inc_warmup = T))

and just put the previous snippet of stan into “von_mises_fit_j.stan”.
In the event that anyone finds this of interest, I can clean up my test with synthetic data and post a full example (generate data, fit to it).


I find all circular models interesting, and I’m really glad that you’re going through all this trouble well before I need to use a model like this. I am frightened by the sentiment in this discussion thread that analysis of circular data is tricky, but I’m encouraged by the fact that the example you end up with will have gotten significant attention.


I am not sure if the von Mises story has been over-dramatised. If you have data with one cluster, you just have to unwrap the results from multiple chains. Treating the unit vector as the main parameter is just a nice way to avoid the issue.
I saw the problems in a more complicated model and noticed that it is not good to constrain the angle - the sampler keeps banging into the wall one has built. Then I think the unit vector formulation is really useful.
I would still be grateful if someone more knowledgable looked at my 1 + (mu_vec[2] * mu_vec[2]); terms and gave me their blessing.


There’s a data structure unit_vector that generates points on the circle.

This example wants to transform that point on a circle into an angle and put a prior on the angle. Hence the need for a Jacobian.


I came back and thought about this some more and now I’m confused about what we need here (if anything) in the way of Jacobians. Our unit vector data type is unusual in that it uses an x and y to provide a single degree of freedom—degrees around the circle.

@betanalpha Does the following program need a Jacobian in order to put a distribution theta?

parameters {
  unit_vector[2] xy;
transformed parameters {
  real<lower=-pi(), upper=pi()> theta = atan2(xy[1], xy[2]);

From first principles, it seems like you’d need a Jacobian if you want to put a distribution on theta. My inclination would be to do one for this

(x, y) -> (atan2(x, y), y)

but I always get confused when the degrees of freedom aren’t easy to map. If you do this, the Jacobian is triangular, and the y -> y bit is the identity, so you only need to add the following to the log density.

log(abs( d/d.x. atan2(x, y) ))

But if I do that, sampling is no longer uniform for theta—it has dips at pi/2 and -pi/2. So that leads me to believe I don’t need a Jacobian—I can just put a ditribution on the angle theta directly.

And why would I map (x, y) -> (atan2(x,y), y) rather than to (atan2(x,y), x) when the latter causes the Jacobian to be zero? Is what I’m missing what happens to y in the first case and that’s why things no longer look uniform?


Remember that you do not evaluate the Jacobian at an arbitrary inverse of the transform but rather all inverses of the transform. For more details, see https://en.wikipedia.org/wiki/Probability_density_function#Dependent_variables_and_change_of_variables.

The immediate issue here is that atan2 is multi-valued, so you have to sum over all of the x and y pairs that could give you the same value of theta. Which unfortunately is an infinite sum.

It’s the same problem that arises when you naively try to transform from (x, y) \in R^{2} to (r, theta) \in R^{2} - {0} to theta \in S^{1}. You can’t really have an isomorphism when the underlying topology changes, and if you try to force it then the Jacobian punishes you.


In the first case you get a transformation to (theta, y) case and in the latter case you get a transformation to (theta, x) space. These are two different spaces with two different joint densities (and possibly two different Jacobians defining those joint densities). It’s only when you marginalize the auxiliary parameter out that you recover the same marginal density for theta.