Unintentional error thrown by `sampling` due to NaNs

Summary:

The example provided below yields 0/0 (NaN) draws from the posterior due to numerical underflow. In addition to the expected warnings about undefined values, divergent transitions, large r-hats, and low effective sample sizes, an error is thrown by the sampling function, which is due to two calls to quantile that use the default argument na.rm = FALSE . Although it might be considered appropriate that such a pathological data configuration should throw an error, I do not believe that this particular error is intended or correct behavior, since the user is already amply warned about trusting any of the results.

Current temporary solution:

My current solution involves hacking into the file rstan package script Misc.R and adding na.rm = T to the calls of quantile on lines 958 and 987. This causes the model below to run to completion (with the warnings but without the errors). I’m mainly looking for feedback from the community on whether this warrants a pull request?

Detailed Description:

I am working on a Bayesian isotonic regression model for a binary outcome in which the probability is assumed to be monotonically changing with an ordered categorical variable (or a continuous variable that has been categorized into a large number of ordered groups). I’ve attached a screenshot of the manuscript explaining the relevant technical background:

Relevant here is that it is required to sample from a possible large number of gamma-distributed random variables each with small shape parameters. This model is prone to underflow, particularly in cases of complete separation of the data. I am aware of the fallacies of trusting the results from such a pathological example as given below. I am not trying to solve the underflow issue here but rather am using it to highlight what I believe to be unintended behavior: namely that the default summary behavior of the rstan sampling function throws an error in the presence of NaNs.

Reproducible Steps:

library(rstan);
# number of observations per group 
n_per_group = as.array(rep(4, 25));
# number of events per group
y_stan = as.array(c(rep(0,17), rep(4, 8)));
# alpha_j ~ gamma(alpha_shape, 1), j = 1, ..., length(y_stan)
alpha_shape = 0.025;

rstan_options(auto_write = TRUE);
options(mc.cores = parallel::detectCores());


isotonic_gamma <- "
data {
  int<lower = 1> n_groups_stan; // number of unique covariate patterns
  int<lower = 0> n_per_group_stan[n_groups_stan]; // number observations per pattern
  int<lower = 0, upper = max(n_per_group_stan)> y_stan[n_groups_stan]; //grouped outcomes
  real<lower = 0> alpha_shape_stan;//
}
parameters {
  vector<lower = 0.0>[n_groups_stan + 1] alpha; //
}
transformed parameters {
  vector<lower = 0.0,upper = 1.0>[n_groups_stan] xi; // 
    real<lower = 0.0> sum_alpha;
    sum_alpha = sum(alpha);
    xi[1] = alpha[1] / sum_alpha;
    if(n_groups_stan > 1) {
      for(i in 2:n_groups_stan) {
        xi[i] = xi[i-1] + (alpha[i] / sum_alpha);
      }
    }
}
model {
  alpha ~ gamma(alpha_shape_stan, 1.0);
  // Pr(Y = 1 | group = j) = sum_{i=1}^j alpha_i / sum_{i=1}^{length(y_stan)} alpha_i 
  y_stan ~ binomial(n_per_group_stan, xi);
}
";

curr_fit <- stan(model_code = isotonic_gamma,
                 data = list(n_groups_stan = length(y_stan),
                             n_per_group_stan = n_per_group,
                             y_stan = y_stan,
                             alpha_shape_stan = alpha_shape), 
                 warmup = 1e3, 
                 iter = 2e3, 
                 chains = 2, 
                 seed = 100,
                 verbose = F,
                 control = list(adapt_delta = 0.70,
                                max_treedepth = 15));

Current Output:

Error in quantile.default(sims, 0.05) : 
  missing values and NaN's not allowed if 'na.rm' is FALSE
In addition: Warning messages:
1: In validityMethod(object) :
  The following variables have undefined values:  xi[1],The following variables have undefined values:  xi[2],The following variables have undefined values:  xi[3],The following variables have undefined values:  xi[4],The following variables have undefined values:  xi[5],The following variables have undefined values:  xi[6],The following variables have undefined values:  xi[7],The following variables have undefined values:  xi[8],The following variables have undefined values:  xi[9],The following variables have undefined values:  xi[10],The following variables have undefined values:  xi[11],The following variables have undefined values:  xi[12],The following variables have undefined values:  xi[13],The following variables have undefined values:  xi[14],The following variables have undefined values:  xi[15],The following variables have undefined values:  xi[16],The following variables have undefined values:  xi[17],The following variables have undefined values:  xi[18],The following varia [... truncated]
2: There were 1188 divergent transitions after warmup. Increasing adapt_delta above 0.7 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
3: There were 21 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 15. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
4: Examine the pairs() plot to diagnose sampling problems
 
5: The largest R-hat is 3.04, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
6: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
Warning message:
In validityMethod(object) :
  The following variables have undefined values:  xi[1],The following variables have undefined values:  xi[2],The following variables have undefined values:  xi[3],The following variables have undefined values:  xi[4],The following variables have undefined values:  xi[5],The following variables have undefined values:  xi[6],The following variables have undefined values:  xi[7],The following variables have undefined values:  xi[8],The following variables have undefined values:  xi[9],The following variables have undefined values:  xi[10],The following variables have undefined values:  xi[11],The following variables have undefined values:  xi[12],The following variables have undefined values:  xi[13],The following variables have undefined values:  xi[14],The following variables have undefined values:  xi[15],The following variables have undefined values:  xi[16],The following variables have undefined values:  xi[17],The following variables have undefined values:  xi[18],The following varia [... truncated]

Expected Output:

An object of class stanfit

RStan Version:

2.19.2

R Version:

R version 3.5.3 (2019-03-11)

Operating System:

macOS Mojave 10.14.5

That does sound frustrating.

There is an is_nan function (https://mc-stan.org/docs/2_20/functions-reference/logical-functions.html) in Stan. Do you think there’s a way you could check for NaNs and replace them with placeholders (and maybe store a variable nans_detected or something)?

Might it be possible to use a simplex type to replace the alpha variable? Or use softmax to take a length-N vector and get one that sums to 1?

Yeah, your results are definitely going to be compromised by this. But maybe you can debug.