Divergent transitions when modeling changing circular concentration (von Mises dist)

Dear forum members,

I am not entirely sure if this is the perfect forum for this query (i.e. whether it is a technical question or a more general modeling question). I am trying to estimate circular concentration using a generalized linear model and I am having difficulty with very large numbers of divergent transitions. My question is how problematic are many divergent transitions in this instance or how might I try and avoid them?

The model investigates changing circular concentration in response to a linear predictor. Specifically, it concerns the concentration of animal headings towards a point on a circle in response to increased polarization of light (from 0 to 1). The mean of the headings is not expected to change in response to changing polarization but the concentration of headings is expected to increase as a function of the common logarithm of the degree of polarization. (In reality, the relationship between polarization and circular concentration could probably be described well by a sigmoid with an upper and lower bound but I have ignored this for now.)
To achieve this, I have implemented a brms model, estimating the circular mean and concentration according to a von Mises distribution, as follows:

m1.formula <- bf(raderr ~ 1, kappa ~ degofpollog10)+ von_mises(link = “tan_half”, link_kappa = “identity”)

m1.prior = c(prior(normal(0,3), class = Intercept),
prior(normal(0,3), class = Intercept, dpar=“kappa”),
prior(normal(0,3), class = b, dpar = “kappa” ))

m1.fit <- brm( m1.formula, prior = m1.prior,
data = data201611, chains = 4, iter = 4000, warmup = 2000,
control = list(adapt_delta = 0.99),
inits = list(list(mu = 0, kappa = .26 ),list(mu = 0, kappa = .26 ),list(mu = 0, kappa = .26 ),list(mu = 0, kappa = .26 ) ) )

The circular concentration (kappa) was modeled using a linear (rather than the default log) link because this better fits the diminishing returns relationship we observed. However, this can produce negative kappa values which are not possible. To try to avoid this (having first tried introducing positive bounded priors), I used prespecified initial values.

This model produces chains which are a bit hairy but with low values of Rhat.

There are also a large number (many hundred) divergent transitions. However, the predictions that this model makes appear reasonable in relation to our prior expectations. I wonder how problematic are these divergent transitions and is there a simple solution which would eliminate them.

PS. Please let me know if there is anything I can so to make this query more useful or further information that I can provide.

  • Operating System: Mac OS Sierra
  • brms Version: 2.17.3

Very likely, these divergent transition are the result of the algorithm moving kappa into the negative territory. You start in the positive area of kappa, but you do nothing to stop it from being negative. You have at least three options to avoid negative values ordered from most reliable to least reliable.

  1. Use the log link. I understand you don’t like this approach, but it will at the very least ensure reasonable kappa values.
  2. Force your regression coefficients to be positive using bounded priors. This will only work reliably if your covariates (i.e. degofpollog10) is positive as well. If this is the case, I suspect it didn’t work when you first tried it, as, by default, the bounds will not affect the intercept. If you go for kappa ~ 0 + intercept + degofpollog10 and then specify prior(normal(0,3), class = b, dpar = “kappa”, lb = 0) it should work provided that degofpollog10 is positive only.
  3. Set stronger priors on the regression coefficients, which ensure positive values with high probability. To make an illustrative example, prior(normal(10,1), class = b, dpar = “kappa”) will put nearly all prior mass to positive values, but will of course be extremely informative (read: don’t use this particular prior).
2 Likes

Thank you Paul,
Option 2 worked very effectively (the predictor is positive only). The chains converge well and there are no divergent transitions.