Neg_binomial_2_log_rng overflow

Hi Stan community,

I am using the function neg_binomial_2_log_rng in the generated quantities in one model and I get overflow:

Exception: neg_binomial_2_log_rng: Random number that came from gamma distribution is 2.82506e+09, but must be less than 1073741824.000000

The model works okay if I don’t calculate the posterior predictive distribution on the generated quantities. To solve this, I have tried doing something like this:

...
if(log_mu_pred[n,s]>20){
      Y_pred[n,s] = neg_binomial_2_log_rng(20,phi);
}else{
     Y_pred[n,s] = neg_binomial_2_log_rng(log_mu_pred[n,s],phi);
}
...

This constrain worked fine for me in the past when I had overflow problems with a Poisson version of the model but is not working for the Negative Binomial model.

...
if(log_mu_pred[n,s]>20){
     Y_pred[n,s] = poisson_log_rng(20);
}else{
     Y_pred[n,s] = poisson_log_rng(log_mu_pred[n,s]);
        }
...

I have also tried the suggestion below with no luck.

Any ideas?

Cheers,
Alfonso

I’m curious of what is your Y, if the counts can be around 485 million and discreteness matters? It seems like you could use a continuous observation family instead.

Y are species counts, the counts in the data re not that big and discreteness does matter in my case. I assume the problem arises during the warm up period

How big is the largest count? If the counts are less than one million, there should not be problems. Can you show the whole code?

The largest count is 5197.

I made a mistake by saying before that the model works okay. I don’t know if the mcmc will converge or not as it is still running. I have a version of the model with Poisson distributed counts that showed no mcmc convergence issues when fitted to my data. I wanted to see if I could implement a Negative Binomial version. When I ran the script I saw the warning messages about overflow and I was curious why restricting the eta values in neg_binomial_2_log_rng was not working

.

Here is the stan code for a simpler version of the model for 1 species.

data {
  int<lower=1> T; 
  int<lower=0> N;
  int<lower = 1, upper = T> time[N];
  int y[N];
}
parameters {
  real alpha;
  real beta;
  real<lower=0> reciprocal_phi;
  real<lower=0> sigma;
  vector[N] log_mu;
}
transformed parameters{
  real phi;
  phi = 1/reciprocal_phi;
}
model {
  alpha ~ normal(0, 0.5);
  beta ~ normal(0, 0.5);
  sigma ~ exponential(5);
  reciprocal_phi ~ normal(0,1);
  
  for(n in 1:N){
    if(time[n]==1){
      log_mu[n] ~ normal(0,5);
    }
    if(time[n]>1){
      log_mu[n] ~ normal(alpha + beta * log_mu[n-1], sigma);
    }
  }
  
  for(n in 1:N){
    target += neg_binomial_2_log_lpmf(y[n] | log_mu[n], phi);
  }
  
}
generated quantities{
  vector[N] log_lik;
  vector[N] log_mu_pred;
  int y_pred[N];

  log_mu_pred[1] = log_mu[1];
  for(t in 2:N){
    log_mu_pred[t] = normal_rng(alpha + beta*log_mu[t-1], sigma);
  }

  for(n in 1:N){
    log_lik[n] = neg_binomial_2_log_lpmf(y[n] | log_mu[n], phi);
      if(log_mu_pred[n]>20){
        y_pred[n] = neg_binomial_2_log_rng(20,phi);
    }else{
        y_pred[n] = neg_binomial_2_log_rng(log_mu_pred[n],phi);
      }
  }
}

What are N, T and time? Does this simpler version also predict counts in millions?

It is very unlikely to be a correct solution if your highest count is about 5000, you should not have log_mu_pred much larger than 8.

By default Stan draws the initial values from uniform distribution [-2,2]. If N is big then your code is likely to produce log_mu_pred values larger than 10, and apparently produces values bigger than 20. You could try initializing the parameters close to 0, e.g. in range [-0.1, 0.1] with option init=0.1. Or you could use Pathfinder to initialize MCMC.

However, it seems there are bigger problems if the sampling takes long, or after the warmup there are still that big log_mu_pred values (generated quantities is not computed during warmup). Tell more about your data, and what you want to model, and maybe we can come up with a better model, priors, and initial values.

1 Like

The data has N rows, which represent the number of site*years, so the first row has the count for site 1 at year 1, the second row site 1 at year 2, row 3 for site 1 at year 3 and so on. The data has 41 sites and 11 years. Therefore N = 41*11 = 451, T is the maximum number of years, which is 11, and time is a vector of length 451 that repeats the sequence 1:11 41 times so it can be used in the model section to ensure that year 1 never uses the previous count which correspond to year 11 of another site.

The stan model in my previous reply is just to show the posterior predictive distribution function neg_binomial_2_log_rng in a simpler version of the model. That model does not shows the overflow problems and it runs relatively fast. In reality, the model is multivariate and the data has the same amount of rows but it has as many columns as species in the analyses (40 species). Also beta becomes a interaction matrix and log_mu is now a multivariate normal distribution, so I also have to estimate a variance-covariance matrix. I did all of this for a version of model that is multivariate and uses Poisson distributed counts (I use the regularised horseshoe prior from your paper and latent variables to reduce dimensionality) in order to disentangle the effects of species-to-species and species-to-environment interactions. The model with Poisson counts had mcmc chains with no convergence issues and it takes a long time to run because it has a lot of parameters. Now I wanted to see if I could implement it with a Negative Binomial distribution for the counts instead of Poisson, and here is where I am having overflow problems.

For the Poisson model most of the log_mu_pred values are between -4.34 and 7.86. I have no clue if the log_mu_pred values are bigger for the Negative binomial model as it is still running.

Firstly, I check if the code runs and if the mcmc sampling starts without issues. Therefore, I run the code for a few minutes on my PC. Then, I kill the job after a few minutes and I send the it to the HPC in my institution. The warning about overflow happens when I am testing that the code runs in my PC and that’s during the warm up for the Negative binomial model (not happening for the Poisson model):

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 1: 
Chain 1: Gradient evaluation took 0.072936 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 729.36 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 2: 
Chain 2: Gradient evaluation took 0.087555 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 875.55 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: 
Chain 3: Gradient evaluation took 0.090199 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 901.99 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 4: 
Chain 4: Gradient evaluation took 0.100356 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1003.56 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 3: Exception: neg_binomial_2_log_rng: Random number that came from gamma distribution is 1.85285e+09, but must be less than 1073741824.000000 (in 'string', line 135, column 10 to column 63)
Chain 3: Exception: neg_binomial_2_log_rng: Random number that came from gamma distribution is 2.19594e+09, but must be less than 1073741824.000000 (in 'string', line 135, column 10 to column 63)
1 Like

Can you try running with Pathfinder? For example, Birthdays case study illustrates the use for testing that the code runs, and that quick approximate results make sense.

I use rstan so I am not used to cmdstan. When I try to implement pathfinder I get the same warning:

model_NB40sp <- cmdstan_model(stan_file = "MARNB_LV_Pomacentrids_cmd.stan",
                       compile_model_methods = T,
                       force_recompile = T)

pth_NB <- model_NB40sp$pathfinder(data = dat, init = 0.1, num_paths = 10, single_path_draws = 40,
                                  history_size = 100, max_lbfgs_iters = 100, refresh = 0)
Chain 1 Exception: neg_binomial_2_log_rng: Random number that came from gamma distribution is 1.08824e+09, but must be less than 1073741824.000000 (in 'C:/Users/alfon/AppData/Local/Temp/Rtmp2F7ti5/model-d42c56437113.stan', line 138, column 10 to column 63)
Warning: Fitting finished unexpectedly! Use the $output() method for more information.

When I change to threshold limit above from 20 to 15

if(log_mu_pred[n,s]>15){
      Y_pred[n,s] = neg_binomial_2_log_rng(15,phi);
}else{
     Y_pred[n,s] = neg_binomial_2_log_rng(log_mu_pred[n,s],phi);
}

it seems to work fine when I check the initial minutes of fitting using rstan and it does works out using pathfinder

pth_NB <- model_NB40sp$pathfinder(data = dat, init = 0.1, num_paths = 10, single_path_draws = 40,
                                  history_size = 100, max_lbfgs_iters = 100, refresh = 0)
602.9 seconds.

However the created initial values seem to have changed the dimensions of the log_mu parameter

initNB <- create_inits(pth_NB)

itNB <- model_NB40sp$sample(data=dat, iter_warmup=100, iter_sampling=100,
                     chains=4, parallel_chains=4,
                     init=initNB)

what causes an error

Running MCMC with 4 parallel chains...

Chain 1 Unrecoverable error evaluating the log probability at the initial value.
Chain 1 Exception: mismatch in number dimensions declared and found in context; processing stage=parameter initialization; variable name=log_mu; dims declared=(451,40); dims found=(18040) (in 'C:/Users/alfon/AppData/Local/Temp/Rtmp8en5nD/model-cb1077525daf.stan', line 17, column 2 to column 28)
Chain 2 Unrecoverable error evaluating the log probability at the initial value.
Chain 2 Exception: mismatch in number dimensions declared and found in context; processing stage=parameter initialization; variable name=log_mu; dims declared=(451,40); dims found=(18040) (in 'C:/Users/alfon/AppData/Local/Temp/Rtmp8en5nD/model-cb1077525daf.stan', line 17, column 2 to column 28)
Chain 3 Unrecoverable error evaluating the log probability at the initial value.
Chain 3 Exception: mismatch in number dimensions declared and found in context; processing stage=parameter initialization; variable name=log_mu; dims declared=(451,40); dims found=(18040) (in 'C:/Users/alfon/AppData/Local/Temp/Rtmp8en5nD/model-cb1077525daf.stan', line 17, column 2 to column 28)
Chain 4 Unrecoverable error evaluating the log probability at the initial value.
Chain 4 Exception: mismatch in number dimensions declared and found in context; processing stage=parameter initialization; variable name=log_mu; dims declared=(451,40); dims found=(18040) (in 'C:/Users/alfon/AppData/Local/Temp/Rtmp8en5nD/model-cb1077525daf.stan', line 17, column 2 to column 28)
Warning: Chain 1 finished unexpectedly!

Warning: Chain 2 finished unexpectedly!

Warning: Chain 3 finished unexpectedly!

Warning: Chain 4 finished unexpectedly!

Warning: Use read_cmdstan_csv() to read the results of the failed chains.
Warning messages:
1: All chains finished unexpectedly! Use the $output(chain_id) method for more information.
 
2: No chains finished successfully. Unable to retrieve the fit. 

Previously, when I used rstan, the line in the model for the log_mu parameter looked like this

parameters{
...
vector[S] log_mu[N];
...
}

With cmdstanr I use what I assume is the newer syntax

parameters{
...
array[N] vector[S] log_mu;
...
}

The dimensions of log_mu should be 451 by 40, but create_inits() makes it one dimensional with length 18040. Not sure if I specified log_mu right as I am not very familiar with cmdstan

When you use Pathfinder to get inits, you don’t need to run generated quantities, so you could remove that block to first get sensible initial values and then include the block only when sampling.

This is probably a bug in my create_inits(). I’ll investigate.

If Pathfinder takes that long, you do have a big data and complex model. You could get some additional speed by vectorizing the neg_binomial_2_log_lpmf() call and by using non-centered parameterization for the prior of log_mu.

Acknowledged

That’s the case.

Vectorizing the neg_binomial_2_log_lmpf() does help a bit as pathfinder now takes 454.7 seconds instead of 602.9 seconds. Thanks a lot!

The parameterization is non-centered for the pars used to estimate log_lambda in the full version of the model. The only instance when I am not sure how implement the non-centered parameterization is for the prior for log_lambda ~ normal(0,5), when time[n] == 1. Note that in this case log_lambda is a vector array.

parameters{
...
array[N] vector[S] log_mu; 
...
}
model{
...
  for(n in 1:N){
    if(time[n]==1){
      idx = idx +1;
      log_mu[n] ~ normal(0,5);
    }
    if(time[n]>1){
      log_mu[n] ~ multi_normal_cholesky(alpha + diag_matrix(beta)*log_mu[n-1], 
                                            L_Omega);
    }
  }
  
...
}

The loop here is a killer. Every n you’re constructing a diagonal matrix and doing the inversion of L_Omega. Make an index of time > 1 and do one call.

for(n in 1:N){
      log_mu[n] ~ multi_normal_cholesky(alpha + diag_matrix(beta)*log_mu[n-1], 
                                            L_Omega);
}

Something along the lines of


int Nm1 = A; // all the times that time[n] != 1;
matrix[A, S] mu;
mu[1, S] = log_mu[1];
mu[2:A, S] = alpha + diag_matrix(beta * log_mu[lag_index]);

log_mu[index_time_gt_one] ~ multi_normal_cholesky(mu, L_Omega);
2 Likes

Sounds good.

I have tried to follow the suggestion and I end up with this

data{
...
  int<lower = 0> N_2_11;        // Number of observations from year 2 to 11
  int<lower = 0> N_1_10;        // Number of observations from year 1 to 10
  int<lower = 0> N_1;           // Number of observations only at year 1
  array[N_2_11] int IND;        // Vector with indexes of the observations for years 2 to 11
  array[N_1_10] int IND_lag;    // Vector with indexes of the observations for years 1 to 10
  array[N_1] int IND_1;         // Vector with indexes of the observations for years 1
...
}
parameters{
...
  vector[S] log_mu[N];
...
}

model{
...
  vector[S] mu_int[N];


  log_mu[IND_1,S] ~ normal(0,5);
  mu_int[IND_1]= log_mu[IND_1];
  
  mu_int[IND] = alpha + diag_matrix(beta)*log_mu[IND_lag];
  
  log_mu[N] ~ multi_normal_cholesky(mu_int[N], L_Omega);
...
}

And I get the following error:

Error in stanc(filename, allow_undefined = TRUE) : 0
Semantic error in 'string', line 90, column 24 to column 57:
   -------------------------------------------------
    88:    mu_int[IND_1]= log_mu[IND_1];
    89:  
    90:    mu_int[IND] = alpha + diag_matrix(beta)*log_mu[IND_lag];
                                 ^
    91:  
    92:    log_mu[N] ~ multi_normal_cholesky(mu_int[N], L_Omega);
   -------------------------------------------------

Ill-typed arguments supplied to infix operator *. Available signatures: 
(int, int) => int
(real, real) => real
(row_vector, vector) => real
(real, vector) => vector
(vector, real) => vector
(matrix, vector) => vector
(complex, complex) => complex
(complex_row_vector, complex_vector) => complex
(real, row_vector) => row_vector
(row_vector, real) => row_vector
(row_vector, matrix) => row_vector
(real, matrix) => matrix
(vector, row_vector) => matrix
(matrix, real) => matrix
(matrix, matrix) => matrix
(complex, complex_vector) => complex_vector
(complex_vector, complex) => complex_vecto

The problem seems to be log_mu[IND_lag] as I get the next error when I try to_vector()

mu_int[IND] = alpha + diag_matrix(beta)*to_vector(log_mu[IND_lag]);
Error in stanc(filename, allow_undefined = TRUE) : 0
Semantic error in 'string', line 90, column 42 to column 68:
   -------------------------------------------------
    88:    mu_int[IND_1]= log_mu[IND_1];
    89:  
    90:    mu_int[IND] = alpha + diag_matrix(beta)*to_vector(log_mu[IND_lag]);
                                                   ^
    91:  
    92:    log_mu[N] ~ multi_normal_cholesky(mu_int[N], L_Omega);
   -------------------------------------------------

Ill-typed arguments supplied to function 'to_vector':
(array[] vector)
Available signatures:
(matrix) => vector
  The first argument must be matrix but got array[] vector
(complex_matrix) => complex_vector
  The first argument must be complex_matrix but got array[] vector
(vector) => vector
  The first argument must be vector but got array[] vector
(row_vector) => vector
  The first argument must be row_vector but got array[] vector
(array[] real) => vector
  The first argument must be array[] real but got array[] vector
(Additional signatures omitted)

I cannot figure out how to make log_mu[IND_lag] just a vector