Modelling circular data with brms, gaussian or von mises?

I’m modeling a dataset where response is the angular difference between two angles. Let’s say that I have a “correct” angle and participants can press a random point in a circle. So I calculated the deviation from the correct angle, in degrees. The distribution is approximately normal centered on 0 and negative/positive values represent respectively anticlockwise and clockwise “errors”.

Then I have some linear predictors and I’m interested in modeling both the mean and the standard deviation (or k) of the data.


I’m omitting the priors and other brm arguments just for simplicity.

The first option is the standard Gaussian model:

fit_formula_lin <- bf(theta ~ 0 + x1*x2 + (1|id), 
                      sigma ~ 0 + x1*x2 + (1|id))
    data = dat_fit,
    family = gaussian())

Where theta is the angle difference (converted in radians). This allows me to have an estimation of the mean and the standard deviation for each condition.

The other option

fit_formula_von <- bf(theta ~ 0 + x1*x2 + (1|id), 
                      kappa ~ 0 + x1*x2 + (1|id))
brm(fit_formula_von ,
    data = dat_fit,
    family = von_mises(),

I’m wondering which is the best modeling option. In particular, the second model takes a very long time to run and the parameters are, for me, more difficult to interpret. At the same time the Gaussian model is not taking into account the circularity of the data but maybe using angular differences and not raw angles this is somehow mitigated.
What do you think? Maybe I can truncate the gaussian prior to -\pi and +\pi

Thank you!


When you have such strong domain-information as this case for circular, use circular. However, there are identifiability issues with circular with low values for concentration, with the extreme of zero concentration making the mean completely unidentifiable, so you probably want a zero avoiding prior for concentration or parameterization as inverse-concentration (spread). There are also numeric issues with very large values for concentration, in which case folks typically put a conditional that switches to a Gaussian approximation in that case.

This work wouldn’t happen to be in the cognitive neuroscience realm, eh?

1 Like

Thank you! So if I get it correctly the circular approach (my second model) is the correct choice and the gaussian is not appropriate. I know that with circular data the standard gaussian approach is wrong. I’m just wondering if working with angular differences (0 centered distribution) somehow mitigates this problem. The problem with the von misses regression is that with extreme variance (too low or too high) the model is problematic?

1 Like

Switching from raw angles to angle differences doesn’t “help” in any way, as the thing with circular is that at a certain spread the density of the tails wraps and starts to get distributed back into the bulk. With sufficiently concentrated distributions that will be negligible and a Gaussian can suffice, and indeed as I say that’s the solution even the picky like me use when the concentration goes so high as to induce numeric issues with the math libraries Stan uses.

1 Like

ok, thank you! I’ll probably do both. I have to say that is not a really debated topic and brms is the only package that allows this modeling flexibility. do you have more suggestions on that? For example, the location parameter (the von mises or gaussian mean) should be similar between the two models if we are in the “less biased case”?

1 Like

So let’s assume that the true generative process is adequately captured by a normal distribution wrapped around a circle (of course this might not be (even approximately) true, in which case neither the Gaussian nor the von Mises is likely to be an adequate modeling choice here–note that the von Mises is a close approximation to the wrapped Normal). The question, then is when can we adequately approximate a wrapped normal using a truncated normal with mean \mu \in (-\pi, \pi) and truncation bounds [\mu - \pi, \mu + \pi]? The answer is that we can do this as long as the standard deviation of the normal is small enough that we have negligible probability mass beyond the truncation bounds.

Zero-centering doesn’t help with the question mentioned above. HOWEVER, there’s an additional problem with sampling both the truncated normal given above and the von Mises as implemented in Stan, which is that if \mu is near \pi, then the model doesn’t have a good way to understand that \mu is also near -\pi, and this produces a serious multimodality problem. Whether you are using the truncated normal approximation or the von Mises, zero-centering DOES help with this problem! In particular, if you can feel confident enough in the true mean to express a prior that strongly avoids the region outside of the interval [-\frac{\pi}{2}, \frac{\pi}{2}], then the multimodality will be suppressed.

Ah, good catch; I hadn’t thought of that but makes sense now that you mention it.

@jsocolar I wonder, would the unit vector representation be helpful here?

Thank you! unfortunately is my first time with this type of data so I’m not very confident. If I get it correctly, the problem arises when the true mean is near pi. For my data (and given that I’m working with 0-centered differences) seems quite unlikely.
So your suggestion is to use a truncated normal [-\pi, \pi] as the likelihood and a prior that gives almost no probability to values ± \frac{\pi}{2}?

(Assuming the true data are a wrapped normal then)

  • If the true standard deviation is small enough so that the tail probabilities past the truncation bounds are negligible AND the data are strong enough to constrain the modeled standard deviation to near its true value, then it should be ok to replace the von Mises with a truncated Gaussian approximation.
  • If you can adjust the data such that the true mean is likely to be near zero AND you can feel sufficiently confident a priori to use a prior that strongly avoids values far from zero (where far is roughly \frac{\pi}{2}, then doing so might help you to sample efficiently from your posterior using HMC. Don’t use such a prior unless it actually reflects genuine domain expertise about the likely difference between the true population mean and the “correct” angle in your experiment. If you think this difference might well be larger than \frac{\pi}{2}, then it would be a mistake to use an overconfident prior just to avoid computational problems.

yes make sense! thank you…
If can be helpful this is the distribution of my dependent variable

what are the units on the x axis?

Assuming the units are radians, it looks like there’s some wrap-around, since the frequencies don’t vanish near \pi. However, the eyeball test also suggests that these data are much heavier tailed than Gaussian or von Mises, and that this problem is likely to be more severe than the error due to the truncation of the tails. I would probably suggest beginning with a student t distribution truncated at -\pi and \pi, because I think capturing the fat tails is more important here than capturing the wrap-around, but I don’t have a strong intuition for this. Statistics is hard!

I’ve tried bot the gaussian and von misses model:

von mises

fit_formula_von <- bf(diff_theta ~ 0 + emotion * mask + (1|id), 
                      kappa ~ 0 + emotion * mask + (1|id))

prior_fit_von <- c(
  prior(normal(0, 2), class = "b", dpar = ""),
  prior(cauchy(0, 5), class = "b", dpar = "kappa")

fit_brm_von <- brm(fit_formula_von,
                   data = dat_fit,
                   prior = prior_fit_von,
                   family = von_mises(),
                   chains = 10,
                   cores = 10,
                   iter = 10000,
                   seed = seed)


fit_formula_lin_trunc <- bf(diff_theta|trunc(lb = -3.141593, ub = 3.141593) ~ 0 + emotion * mask + (1|id),
                            sigma ~ 0 + emotion * mask + (1|id))

prior_fit_lin <- c(
  prior(normal(0, 2), class = "b", dpar = ""),
  prior(cauchy(0, 2), class = "b", dpar = "sigma")

fit_brm_lin_trunc <- brm(fit_formula_lin_trunc,
                         data = dat_fit,
                         prior = prior_fit_lin,
                         family = gaussian(),
                         chains = 10,
                         cores = 10,
                         iter = 10000,
                         seed = seed)

The fitting is slow but no warning or issues. The pp_check() is not really good. Apart from the tails suggested by @jsocolar, also the model is not well capturing the high probability of near 0 data.
What do you think?

The pp_check() is very similar across model so I’ve posted the von misses model

Failing to capture the high probability near zero is exactly the same problem as the tails. A Gaussian that puts that much probability mass near zero is more-or-less dead certain that there is no probability mass way out in the tails. Since you have data way out in the tails, the best-fitting Gaussian cannot be one that puts so much probability mass near zero. You need something much heavier-tailed than Gaussian. Maybe an appropriate t-distribution could work, or maybe (eyeballing your data here) even a mixture of a t-distribuiton and a uniform distribution. Would that make sense from your domain knowledge (i.e. would it surprise you if some participants were trying to approximate the true angle and a minority were responding completely haphazardly with complete disregard for the true angle)?