Thanks for bringing this back up – I think I was wrong about the Jacobian stuff. You don’t need it here. If you have a random variable that lives on a unit circle, then you can map little arclengths there directly onto 0 to 2*pi or whatever range atan2 uses.

I think it’s completely fine to just say mu ~ von_mises(3, 1) even when mu is a transformed variable here, cause you’re just mapping in line wrapped around a circle in 2D to one stretched out on one axis. The arclengths are arclengths. What you have coded up here is the same thing.

Hope I finally got it straight in my head this time… If not oh well :D. What you’re doing seems good to me.

Incorrect. You can map little arclenghts onto the interval [0, 2pi] locally, but the transformations that we’re talking about here are global. Again, the problem is that the transform from an interval to a circle is not 1-1 and it can’t be because the topologies are different. In fact if you tried to calculate the Jacobian you’d find that you get an infinite sum of terms which cannot be evaluated – again, a manifestation of the topological problems.

You can embed the circle in R^{n + 1} but then you have the problem of trying to map R^{n} -> R^{n+ 1}, which is also not 1-1 and hence doesn’t admit a clean transformation between densities.

As someone who had a rigorous education but not much formal mathematics, I’d highly recommend John Lee’s “Topological Manifolds”. It gets really complicated after a few chapters, but the first few are comprehensive and carefully explain all of the concepts.

The basic idea is you have a space of points and then you endow it with more structure by defining “what sets of points are open”? Then after you work with this structure you see what the different possible assignments of open sets, or topologies, manifest as what we would think of as different shapes. So torii, spheres, and real numbers end up being spaces defined by different topologies.

There’s a lot of rigorous math you can do with topology, but there’s also lots of concepts that you can take away an apply more informally. The relevant notion here is that probability distributions are almost always defined by a space’s topology. If you want to map a distribution from one space to another then you also need to map the corresponding topologies. If the topologies are inconsistent (sphere vs real numbers, etc) then you can’t (usually) construct such a map and hence can’t map distributions back and forth as you might naively expect that you could.

The really cool thing is that, like measure theory itself, all of these formal math concepts help to identify what would otherwise be really subtle pathologies in your analysis/algorithms that would take forever to find if ever found at all!

I’d recommend starting where @betanalpha pointed me, with John Baez an Javier P. Muniain’s Gauge Fields, Knots and Gravity, which sounds like it’s going to be even more hairy, but starts with a lovely introduction to smooth manifolds. It does presuppose you know a bit of algebra and analysis before you start.

Good - one needs a correction, but one question has not been answered.
By moving the transformation to the right hand side (in von_mises_prior_lpdf()), does the problem go away ?

If you are confused about Jacobians then I highly recommend that you spend some time going over probability densities and their properties. They’re sufficiently non-intuitive that it’s worth putting time into the theory so that you don’t end up trying to guess in every application.

The important property here is that you don’t need Jacobians for parameters you condition on but you do for parameters in the distribution. So let pi(f | g) be a conditional probability density function but let f and g be specified as functions of other parameters, f(x) and g(y).

What is pi(f | y)? Just pi(f | g(y)).

What is pi(x | g)? Now we need the Jacobian: pi(f(x) | g) | dx/df |.

But in the example above (immediately above, not the start of the thread), the prior acts directly on the unit vector parameter (mu_vec). It takes the raw unit vector as a parameter, not the version transformed to an angle.

It certainly does not provoke stan’s warning about needing a Jacobian. Have I simply stumbled across a way to obscure the structure, so as to hide a mistake from stan ?

the reason that you don’t get the warning is that you’ve “hidden” the nonlinear transformation in the user-defined function so that the parser can’t see it. That construction still needs a Jacobian to be correct.

That construction still needs a Jacobian to be correct

Thanks. That is what I wanted spelled out.
When I look at the derivative, I think I have to do something like target += -log(1 + x^2);
and target += -log(1 + y^2);

for the two vector components and remove the helper function, but I better check my signs and experiment a bit. I want it to be correct, but I am struggling to find a good numerical test - something that goes wrong when I have the wrong derivatives.

It is embarrassing that what I wrote in the earlier post seems to be patently wrong.
Thanks for the comments.

Probability is really hard. No shame in struggling provided that one respects the math and doesn’t try to hack something together.

You want to look at MCMC output to verity that you’re quantifying the right distribution, so it helps to have analytic expectations (means, medians, variances, quantiles, etc) for comparison. Keeping in mind that expectations on circles are weird (ambiguity in defining the mean, hence the variance, etc).

Simulation is the best way to make sure you’re doing the right thing. You’ll see that things won’t get the right answer if you get the Jacobians wrong—you’ll be sampling from the wrong distribution.

No, here I do not really agree. Obviously I work with simulated data.
I have two extremes in order to highlight problems. First, I have plenty of data - a few thousand points and second, I generate a small amount of data - 50 points.
The problem is that the Jacobian is necessary in the prior, but the prior is so unimportant.

If you have no prior, stan finds the centre and kappa.

If one puts in a weak prior (mu~von_mises(x, y)) and no Jacobian, stan finds the centre and kappa with a difference in the second digit after the decimal point.

If I put in what I think is correct, (log of the derivative of arctan) the values also come out correct. Maybe they are.

I think I picked my words correctly before. I am struggling to find a good numeric test. If you use a unit vector, it is hard for the system to go wrong. The dominant derivatives will always send mu towards the middle of the data.

I would really like to get this correct in principle, not just correct in the sense of the results look OK.

That means making sure you are doing the right math. So either convince yourself or find somebody else to compare to. The gateway drug here is the log-normal density where y is data and the model is.

log(y) ~ normal(mu, sigma)

At least one of Bob’s case studies goes through this in detail which I found helpful at the beginning. Another fun example to do is sampling from a donut defined by angle/radius (easy) and then transforming so that the sampler is actually moving in cartesian coordinates but sampling from the same density. Maybe bivariate log-normal would be a better second step but it’s less fun.

If you want to try it on your density I would skip the function business the first time around so you don’t add an extra layer of confusion.

@Bob_Carpenter is referring to sampling from the prior alone to verify that you are getting the correct distribution. You should not be looking at data until you can verify that the prior is implemented correctly.

Yes indeed, thanks. I did eventually realise I should fix the prior by itself.
May I post the code with the Jacobian that seems to work and return the correct value for mu ?
My jacobian looks like this

data {
// vector [NP] angles; // where one would have real data
real mu_prior; // fed in from test script
real kappa_fixed; // also given by test script
}
parameters {
unit_vector [2] mu_vec;
}
transformed parameters {
real mu = atan2(mu_vec[2], mu_vec[1]);
}
model {
real t1;
real t2;
mu ~ von_mises (mu_prior, kappa_fixed);
// angles ~ von_mises (mu, kappa); // where one would use mu
t1 = 1 + (mu_vec[2] * mu_vec[2]);
target += - log (t1);
t2 = 1 + (mu_vec[1] * mu_vec[1]);
target += - log (t2);
}

This started out as a suggestion for the manual for anybody who has to work with circular data. I am not sure if it is helpful or confusing. I would still offer to write a few lines, but only if wiser men give me their blessing.

Of course I can post a full model, but the point here was to check the Jacobian on the prior, not on real data.
A little rstan driver to test it looks like

and just put the previous snippet of stan into “von_mises_fit_j.stan”.
In the event that anyone finds this of interest, I can clean up my test with synthetic data and post a full example (generate data, fit to it).

I find all circular models interesting, and I’m really glad that you’re going through all this trouble well before I need to use a model like this. I am frightened by the sentiment in this discussion thread that analysis of circular data is tricky, but I’m encouraged by the fact that the example you end up with will have gotten significant attention.