Errors thrown with von mises distribution

Hi all,

I’m trying to fit a slightly fiddly regression model, and am coming up against a couple of error messages as described below. Briefly, the data I’m trying to model is the angular distance between the true orientation of an arrow and the orientation recalled by subjects from memory. This error should be roughly von mises distributed, centered on 0, and I’m interested in whether the precision changes across the different experimental conditions; so, I’ve put a linear model on kappa, containing an average effect of each condition (alpha) plus varying effects for each subject in each condition (beta). The link is a softplus function (log(1 + exp(x)).

When I try to fit this I get two error messages, each printed lots of times:

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue: Exception: Error in function boost::math::cyl_bessel_i<double>(double,double): numeric overflow (in 'C:/Users/hfleming/AppData/Local/Temp/RtmpCij6ml/model-6f804b2043c7.stan', line 81, column 4 to column 64)

and

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: kappa_regression_model_namespace::log_prob: kappa[sym1__, sym2__] is nan, but must be greater than or equal to 0 (in 'C:/Users/hfleming/AppData/Local/Temp/RtmpCij6ml/model-6f804b2043c7.stan', line 54, column 2 to column 39)

I don’t really understand why I’m getting these messages - there’s obviously something wrong with the way it’s sampling kappa, but as far as I can tell, all the constraints are there that should be there and all the priors are proper. Fiddling with the priors doesn’t seem to help, nor does specifying initial values in cmdstan. If I leave the model for long enough eventually it starts sampling with no errors from that point on, but I still feel very uneasy about trusting the model fit if I don’t understand the errors!

Would anyone be able to explain what’s causing these messages, and what I can do to fix this? Happy to add some sample data if it helps.

Thanks very much in advance,
Hugo

functions {
 
  /* Scale_r_cor
  * (from brms)
   * compute correlated group-level effects
   * Args: 
   *   z: matrix of unscaled group-level effects
   *   SD: vector of standard deviation parameters
   *   L: cholesky factor correlation matrix
   * Returns: 
   *   matrix of scaled group-level effects
   */ 
  matrix scale_r_cor(matrix z, vector SD, matrix L) {
    // returns effects in another dimension order than z
    return transpose(diag_pre_multiply(SD, L) * z);
  }

}
  
data {
  int<lower=1> N; //Number of trials
  int<lower=1> Nsubs; //Number of subjects
  int<lower=1, upper=6> Ntreats; //Number of treatments to model
  int<lower=1, upper=Nsubs> subjn[N]; //Index of subject numbers on each trial
  int<lower=1, upper=6> treatment[N]; //Index of treatment on each trial
  vector<lower=-pi(), upper=pi()>[N] y; //Outcome to model (response errors)
}


parameters {
  real<lower=-pi(),upper=pi()> I_bar; //population of mean error from which subs are drawn
  real<lower=0> I_kappa; //population of mean error from which subs are drawn

  vector<lower=-pi(),upper=pi()>[Nsubs] I;
  
  vector[Ntreats] alpha; //average treatment effect (intercept)

  matrix[Ntreats,Nsubs] z_sub; //standardised group-level effects
  vector<lower=0>[Ntreats] sigma_sub; //sd of each treatment effect across subjects
  cholesky_factor_corr[Ntreats] L; //cholesky factor of correlation matrix
}

transformed parameters {
  matrix[Nsubs,Ntreats] beta; //actual group level effects (deviation from average effect)
  matrix<lower=0>[Nsubs,Ntreats] kappa;

  //recentre the matrix of correlated effects
  beta = scale_r_cor(z_sub, sigma_sub, L); 

  for (sub in 1:Nsubs){
    for (treat in 1:Ntreats){
      kappa[sub,treat] = log1p_exp(alpha[treat] + beta[sub,treat]);
    }
  }

}

model {
  //Mean error part
  I_bar ~ von_mises(0,0.5);
  I_kappa ~ exponential(1);
  I ~ von_mises(I_bar,I_kappa);
  
  // Precision part
  alpha ~ normal(7,5);
  to_vector(z_sub) ~ std_normal();
  sigma_sub ~ exponential(1);
  L ~ lkj_corr_cholesky(2);
  
  // Data
  for (n in 1:N){
    y[n] ~ von_mises(I[subjn[n]], kappa[subjn[n],treatment[n]]);
  }
}

generated quantities {
  
  vector[N] log_lik;

  for (n in 1:N){
    log_lik[n] = von_mises_lpdf(y[n] | I[subjn[n]], kappa[subjn[n],treatment[n]]);
  }
  
}

1 Like

Yeah, that kappa parameter blows up boost when it’s value is too high; when I use it I switch to the normal with an sd=sqrt(1/kappa) when encountering kappas>10

Hi Mike, thanks, yes that’s fixed the first message. I’m still getting the message about kappa being nan though - any thoughts on what that might be? Potentially over/underflow through the log1p_exp() link function?

Very possibly; what’s the motivation for that link? I’ve always used just a log-link for kappa in the past.

Oops, I mean that I’ve always done kappa = exp(...)

Yeah I had a look at that one but wasn’t particularly happy with the long right tail it implies for the prior on kappa - possibly it’ll just get swamped by the data anyway, but I saw that the softplus was an option offered in brms so thought I would give it a go.

But I think you have things backwards. You want kappa to never be a negative value, but log1p_exp() is computing a log at the end, which can yield negative values.

Sorry, I can’t math apparently. the 1+ part of log1p_exp. does indeed ensure positive.

(well, combined with the exp part)

Was just double checking myself! But yes, the attraction was that it’s basically linear for x > 5 or so, but it does look a bit funky when x gets closer to 0.

What I think I may do is try with the log link and see if that fixes the remaining error message - will post back if that does indeed help.

Hi @mike-lawrence (or indeed anyone else that might be able to help!) - thanks very much for your advice yesterday. I’ve now tried running the model with both the log link instead of the softplus, and I’ve also tried really narrowing the implied prior over kappa so that essentially all of the probability density is < 10, but unfortunately still no luck.

Interestingly I do seem to have narrowed down the problem though - as recommended I added the option to switch to a normal distribution if kappa > 10, and I then started getting messages about the scale parameter for the normal being 0 when it should be positive, suggesting the underlying value of kappa was very large. Similarly, I’ve tried putting an improper upper bound on kappa and I then get error messages saying the proposal for kappa was around 200 or even more.

The thing is, I just don’t understand why it’s trying to sample such large values of kappa: although (without the upper bound) such large values of kappa are theoretically possible, they should be vanishingly unlikely with my priors - in prior predictions I can do 30000 draws and the largest value of kappa I get is 35.

Is there something I’m missing? Although with the above model eventually it does seem to sample normally once it gets past these initial errors, with a slightly expanded model I’m trying it seems to get stuck and can’t get past the errors at the beginning.

1 Like

This looks like the prior conflicts with the data - if you have enough data, it will overwhelm even the narrowest of priors. So the thing to investigate is why the data “wants” be there so badly. I unfortunately don’t understand the model very well, but some strategies to look at:

  • What happens with a subset of the data? What if you simplify the model further (e.g. just one subject, just one treatment)
  • von Mises has issues if the mean is near the boundary where the mapping fails (i.e. in your case -pi() should be a neighbour to pi() - epsilon, but on the unconstrained scale they are infinitely far). Are you sure the model is not trying to predict across the boundary?

Best of luck with the model!

1 Like

I know this is a very old thread, but after having struggled recently with a similar confusion why stan is sampling extremely high values of kappa, I finally understand it, so I thought I would share it here if this comes up in search again as it did for me.

Stan initializes each parameter in the parameter block on the unconstrained scale with a uniform random draw in the range [-2,2]. Kappa is on the log scale, and this should still produce reasonable values on the constrained scale IF THERE WERE NO RANDOM EFFECTS. Each component of the final kappa value - the intercept, the treatment effects, the standardized random intercept parameters for each subject, the rstandardized andom slopes, and the log of the sds of these are initialized in [-2,2]. Then you could easily* end up with an initial value of kappa for one subject and condition of exp(2+2*2+2*2…). Of course all those values being two is not super likely, but they just need to be on the upper end of the range and with enough parameters (many subjects, random intercepts+slopes), a few of them will get a super high initial kappa value that leads to overflow of the bessel function.

Setting custom initis sometimes helps, but the sampler will use initialy a big step. size and find problematic parameter values that lead to overflow. The step size is also boosted temporarily several times throughout the adaptation period, potentially causing the problem later during warmup, even if the initialization was lucky and avoided the issue.

I have run into this issue not just with the von_mises distribution, but with other custom functions that involve the bessel functions somewhere in the likelihood, and in those cases we couldn’t switch to a normal approximation as with the von_mises. The only thing that seemed to work most of the time, is to initialize all kappa-related parameters near 0, to reduce the initial step-size substantially to something like 0.1, and to put strong priors. By themselves each of these was not sufficient, but the combination ended up working.

Another solution was to rescale kappa - instead of just taking exp(kappa), taking exp(kappa/50). This helped because now all those unlucky cases where the sampler selects high values for all components of kappa on the unconstrained scale, were now drastically reduced. This also helped with adaptation as it made the posterior variance of kappa more similar to the standardized random effect’s posterior variance, leading to easier adaptation of the inverse mass matrix.

1 Like

With all of that said, this should soon (hopefully!) no longer be a problem for the von_mises distribution: Improve numerical stability of von_mises_lpdf by venpopov · Pull Request #3036 · stan-dev/math · GitHub :)

1 Like