`neg_binomial_2_lcdf()` "grad_2F1: k (internal counter) exceeded 100000 iterations"

I am running a regression with a negative binomial error. The outcome is occasionally censored from below, but I am getting the following error on the line with the neg_binomial_2_lcdf call:

Exception: grad_2F1: k (internal counter) exceeded 100000 iterations, hypergeometric function gradient did not converge. (in ‘model4923325fd331_nb_std_censored’ at line 30)

The fact that not all chains fail makes me think it might be an issue with the prior or the initialization. Any ideas would be greatly appreciated. Model and fake data below.

data {
  int N; //the number of observations
  int<lower=0,upper=1> detected[N]; // whether the outcome is detected (i.e. not censored)
  int K; //the number of columns in the design matrix
  int<lower=0> y[N]; //the response
  matrix[N,K] X; //the design matrix
  real reciprocal_phi_scale;
}
parameters {
  vector[K] beta; //the regression parameters
  real<lower=0> reciprocal_phi;
}
transformed parameters {
  real phi;
  phi = 1. / reciprocal_phi;
}

model {
  vector[N] Xbeta;
  Xbeta = X * beta;
  beta ~ cauchy(0,10);
  reciprocal_phi ~ cauchy(0,reciprocal_phi_scale);
  
  for (i in 1:N) {
    if (detected[i]) {
      y[i] ~ neg_binomial_2_log(Xbeta[i], phi);
    } else {
      target += neg_binomial_2_lcdf(y[i] | exp(Xbeta[i]), phi);
    }
  }
}
generated quantities {
  vector[N] y_sim;
  
  for (i in 1:N) {
    y_sim[i] = neg_binomial_2_log_rng(X[i]*beta, phi);
  }
}

Without covariates rarely gives an error. Note that I’m taking observations \leq 3 and observing them as censored \leq 3.

N <- 100
y <- rnbinom(N, mu=6, size=2)
nb_data <- list(N=N,
             y=ifelse(y > 3, y, 3),
             detected= ifelse(y > 3, 1, 0),
             K=1,
             X=data.frame(c=rep(1,N)),
             reciprocal_phi_scale=5,
             beta_scale=5)

And some data that occasionally gives the error. My real data gives the error reliably but I can’t share it.

N <- 100
x <- rnorm(N, sd=1)
beta <- c(1, .5)
y <- rnbinom(N, mu=exp(beta[1]+beta[2]*x), size=2)

nb_data <- list(N=N,
             y=ifelse(y > 3, y, 3),
             detected= ifelse(y > 3, 1, 0),
             K=2,
             X=data.frame(c=rep(1,N), x=x),
             reciprocal_phi_scale=5,
             beta_scale=5)

Thanks for the thorough example. For this particular example could you add a print statement that outputs the neg_binomial_2_lcdf parameters immediately prior to the usage of that function? I’m guessing the gradient is just underflowing. While that function is now pretty robust it may be that we could change how it is used and avoid this error. Do you get it consistently during sampling or just occasionally?

Thanks for the quick reply. Here is an example that gives the error:

y=1 
mu=0.000519101
phi=1.70972

I get the error about half the times I fit my data (less often with the simulated data above). As soon as I get the error, that chain crashes.

Interestingly, I don’t get it with BRMS, which can also fit this model. I guess the priors used there avoid it.

That shouldn’t be crashing a chain, this is good to know. I’ll check what is…

Actually brms is getting the same error resulting in chains crashing, too, so I’m stuck. Is there a GitHub issue where I can track this? (Or should I make one, and if so in which repo)?

Yeah, this is in the Stan math library, it’ll be all the interfaces. There’s no issue yet because in the past I could not reproduce it using just the math library. I’ll post here when I make some progress…

Any progress on that? It has cropped up for another user, so wanted to see if this is the same issue (also I don’t see this reported in Stan math)… Thanks!

1 Like

Ha! I got distracted, looking at the tests now.

I’ll make an issue if there isn’t one, I made a (failing) test:

TEST(MathPrimScalFun, grad2F1_11) {                                                                                                                        
  double a1 = 3.70975;                                                                                                                        
  double a2 = 1.0;                                                                                                                        
  double b1 = 2.70975;                                                                                                                        
  double z = 0.999696;                                                                                                                        
                                                                                                                        
  double grad_a1;                                                                                                                        
  double grad_b1;                                                                                                                        
  stan::math::grad_2F1(grad_a1, grad_b1, a1, a2, b1, z);                                                                                                                                                                                                                                         
} 

[edit]: made the issue

3 Likes

It’s fixed: https://github.com/stan-dev/math/pull/2175

Thanks for pinging me, it was fun to actually do something in the project again!

If a user tries to calculate this close enough to the boundary it may be slow or even break but if that becomes an issue we need to look up some alternative method for that corner of parameter space (or the model is extreme in some way).

3 Likes

The problem seems to persist for more complex models.
If you have a go on the below code you would see that the
" grad_2F1: k (internal counter) exceeded 100000 iterations"
pops up again.

x <- rnorm(10)
mu <- exp(2 + x)
trunc <- sample(1:150, 100)
obs <- c()
id <- c()
for(u in mu){
   obs <- c(obs, rnbinom(n = 10, mu = u, size = .7))
   id <- c(id, rep(paste('id', round(u, 3), sep = ''), 10))
}
obs_trunc <- ifelse(obs < trunc, obs, trunc)
df <- data.frame(y = obs_trunc, group = id, truncated_at = trunc)

brm(bf(y|trunc(ub = truncated_at) ~ 1 + (1|group), shape ~ 1 + (1|group),
       family = negbinomial), data = df)

I could give a bit more detail about what increases chances of getting it:

  • higher truncation bounds
  • adding group effects to the shape parameter
  • and larger sample size (and presumably larger number of groups)
  • using the negbinomial_lpdf instead of negbinomial_2_lpmf (in rstan)

Assuming that there is no easy solution to it, would you think that just increasing no.chains in order to eventually get a chain to succeed is a legitimate approach? (I’m rapidly approaching 60 chains and more in order to get 1, 2 chains to work with my data.)

2 Likes

Thanks for reporting. If the problem is “only” that some chains are unable to initialize, then I think your best bet is to try narrower inits (e.g. init = 0.1 or even init = 0) and if that does not help, then even consider providing some specific init values for some of the parameters. If the error happens later in the sampling, especially post-warmup then using more chains is tricky as it is likely that the “surviving” chains will be biased - you select the chains that didn’t explore some problematic parts of the space. If the error is post-initializion but still during warmup, you might also try to increase adapt_delta to get a bit slowe exploration and decrease the chances of encountering the weird part during warmup.

Hope you can get this resolved.

Hi Martin, thanks so much for answering and looking into this. I can already tell
that the error is not happening post warm-up, buit at the initialisation of the chains.

Moreover, the init = 0 does work for upper truncation bounds <= 500, but once these are much higher the init = 0 fails completely. Using init_r = 1 seems to solve the thing partially, but a lot of chains fail - using init_r = .1 leads to log(0) problems where it can’t start sampling.
So as there is just an initialisation problem, it might not be as much of a problem starting a lot of chains and using succeeding chains @martinmodrak ?

Then it is probably less of an issue, but I think understanding the problem is still likely to be useful.

If I understand the model correctly, you are sometimes imposing truncation bounds that are unlikely to actually matter, because the truncation bound is just too far from the mean. And I think this is where the neg_binomial_2_lcdf runs into problems. So one solution IMHO is to drop the upper bounds when they are obviously ridicilous. (Although convincing brms to truncate just some of your responses will likely not be completely straightforward and may require you to implement a custom family).

I also looked a bit more at the inits and I think that the trouble lies more with the inits for shape. So when one uses a more nuanced inits for the shape predictors only, I get much more succesful initialization.

Here’s what I run:

init_fun <- function() {
  list(Intercept_shape = rnorm(1, sd = 0.1),
       z_2 = rnorm(length(unique(id)), sd = 0.1))
}
brm(bf(y|trunc(ub = truncated_at) ~ 1 + (1|group), shape ~ 1 + (1|group),
       family = negbinomial), data = df, init = init_fun)

Note that you need to use the name of the parameters as in the Stan code, so I used make_stancode to see how the parameters that actually go into the computation of the shape are called. z_2 here is the non-centered varying intercept per group.

Aside from the solution to the specific problem, thanks for reporting this, I think I know what the solution is so I’m just replying to remind myself to file an issue…

1 Like

Hi Martin,
thanks, I came to a very similar conclusion, giving shape and mean an -1 initial value let’s the sampling start with no further issues. So it might be that it is not only shape that requires a gentle touch such that it starts sampling.