New User: Model crashes R

I’m a relatively new user with Stan, and as I’m trying to add complexity of having an effect of day in my model, the model crashes R when I try to run it. (Unfortunately, I can’t share the data.)

My data are from a clinical study of a drug. There is an apparent placebo effect, so I’m trying to fit an effect for post-dose days by day (day 1 has a different effect than day 2…); all predose days are considered equivalent (therefore the post_dose_day variables).

Generally, I’m trying to model something like:

y = \frac{emax \times x}{ec50 + x} + day\_effect

y has values of 0 to 4 (translated to 1 to 5 due to 1-based indexing).

The crash occurs without any other notification; I see the following, and then it drops back to the command line without any other information.

> fit_ordered_emax_day <- stan(file="ordered_logit_emax_day.stan", data=d_mod_$

SAMPLING FOR MODEL 'ordered_logit_emax_day' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
data {
  int<lower=2> K_levels;
  int<lower=0> N_observations;
  int<lower=0> N_days;
  int<lower=0> N_post_dose_observations;
  int<lower=0> post_dose_day_indices[N_post_dose_observations];
  int<lower=0> post_dose_day[N_post_dose_observations];
  int<lower=1,upper=K_levels> y[N_observations];
  real<lower=0> x[N_observations];
}
parameters {
  real emax;
  real<lower=0> ec50;
  ordered[K_levels-1] c;
  vector<lower=0>[N_days-1] e_day;
}
transformed parameters {
  vector[N_observations] day_effect = rep_vector(0, N_observations);
  for (i in 1:N_post_dose_observations)
    day_effect[post_dose_day_indices[i]] = e_day[post_dose_day[i]];
}
model {
  for (n in 1:N_observations) {
    y[n] ~ ordered_logistic(emax*x[n]/(ec50 + x[n]) + day_effect[n], c);
  }
}

If you execute tempdir() before calling stan and then change to that directory in another shell, there should be more information in one of the txt files.

Inexplicably to me, it’s now working.

I changed the name of the .stan file (from “ordered_logit_emax_day.stan” to “broken.stan”) because I’ve continued working on the original model trying to make it work.

Is there some form of caching which may be causing the problem? If yes, can I disable that?

… and it’s broken again with the same issue.

Here are the contents of one randomly named file which look possibly informative:

Chain 1: Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1: Exception: ordered_logistic: Cut-points is not a valid ordered vector. The element at 4 is 7.74014e+052, but should be greater than the previous element, 7.74014e+052  (in 'model4878497624c9_ordered_logit_emax_day_simple' at line 22)

Chain 1: If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1: but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1: 
Chain 1: Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1: Exception: ordered_logistic: Cut-points is not a valid ordered vector. The element at 4 is 4.14668e+052, but should be greater than the previous element, 4.14668e+052  (in 'model4878497624c9_ordered_logit_emax_day_simple' at line 22)

Chain 1: If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1: but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1: 

Here are the contents of the _StanProgress.txt file (which don’t look informative to me):

Click the Refresh button to see progress of the chains
starting worker pid=12760 on localhost:11713 at 11:56:33.823
starting worker pid=7052 on localhost:11713 at 11:56:34.002
starting worker pid=18684 on localhost:11713 at 11:56:34.351
starting worker pid=8356 on localhost:11713 at 11:56:34.625

SAMPLING FOR MODEL 'broken' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.001 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 

SAMPLING FOR MODEL 'broken' NOW (CHAIN 2).
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)

SAMPLING FOR MODEL 'broken' NOW (CHAIN 4).

SAMPLING FOR MODEL 'broken' NOW (CHAIN 3).
Chain 2: 
Chain 2: Gradient evaluation took 0 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 4: 
Chain 4: Gradient evaluation took 0 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 3: 
Chain 3: Gradient evaluation took 0.001 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 10 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)

It looks to be having trouble with extreme values of the cutpoints, although I don’t know why that would cause it to stop sampling. You may need some prior on c.

@bgoodri, Thanks!

In a subsequent test, I put a prior on c with c ~ uniform(-20, 20);, but it continued crashing. I ended up going down a different modeling path for the placebo effect and I’m making progress there.

Overall, I would think that rstan should return some form of error instead of crashing the entire session. Is there a way that I can help troubleshoot why the R session is crashing so that a more informative error could be given? (I’m not sure what info to provide, and unfortunately, I can’t share the data.)

It isn’t supposed to be doing that but appears to hit something that causes an abort before flushing what the problem is. If you can generate simulated data that triggers this problem, that would be helpful.

I haven’t been able to simulate data that replicates the behavior. :(

Further to @bgoodri response, I am seeing the exception referred to above with the following model and data.

data {
  int<lower=1> N;             // Number of observations
  int<lower=1> K;             // Number of ordinal categories
  int<lower=1, upper=K> y[N]; // Observed ordinals
}
parameters {
  ordered[K - 1] c; // (Internal) cut points
}
model {
  vector[N] mu = rep_vector(0, N);
  target += uniform_lpdf(c | -20, 20);

  for(i in 1:N){
    target += ordered_logistic_lpmf(y[i] | mu[i], c);
  }
}

Compile, generate data and sample.

library(cmdstanr)
library(posterior)
# set_cmdstan_path(<your relevant path for cmdstan>)
file <- file.path("secondary_ord1.stan")  
  
mod <- cmdstan_model(file)

p <- c(0.1, 0.2, 0.4, 0.2, 0.1)
ld <- list()
ld$N <- 100
set.seed(5)
ld$y <- sample(seq_along(p), ld$N, replace = T, prob = p)
table(ld$y)
ld$K <- max(ld$y)

fit <- mod$sample(
    data = ld, 
    seed = 12, 
    chains = 1, 
    parallel_chains = 1,
    refresh = 1000,
    iter_warmup = 2000,
    iter_sampling = 10000
  )
print(as_draws_df(fit$sampler_diagnostics()))
fit$cmdstan_diagnose()
fit$cmdstan_summary()
  
dsmpl <- as_draws_df(fit$draws())
cp <- colMeans(dsmpl)[paste0("c[", 1:4, "]")]
rbind(c(sapply(cp, plogis), 1),
         cumsum(p))

The cut points seem to be in the ballpark and the diagnostics didn’t raise any flags.
The exception is reported during the warmup phase:

Chain 1 Iteration:     1 / 12000 [  0%]  (Warmup) 
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: ordered_logistic: Cut-points is not a valid ordered vector. The element at 3 is -16.0042, but should be greater than the previous element, -16.0042 (in '/tmp/Rtmpztzhjw/model-cc9773e22318.stan', line 18, column 4 to column 53)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: ordered_logistic: Cut-points is not a valid ordered vector. The element at 3 is -32.9553, but should be greater than the previous element, -32.9553 (in '/tmp/Rtmpztzhjw/model-cc9773e22318.stan', line 18, column 4 to column 53)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: ordered_logistic: Cut-points is not a valid ordered vector. The element at 2 is -584.811, but should be greater than the previous element, -584.811 (in '/tmp/Rtmpztzhjw/model-cc9773e22318.stan', line 18, column 4 to column 53)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1 Exception: ordered_logistic: Cut-points is not a valid ordered vector. The element at 2 is -13.5391, but should be greater than the previous element, -13.5391 (in '/tmp/Rtmpztzhjw/model-cc9773e22318.stan', line 18, column 4 to column 53)
Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Chain 1 
Chain 1 Iteration:  1000 / 12000 [  8%]  (Warmup) 
Chain 1 Iteration:  2000 / 12000 [ 16%]  (Warmup) 

And is ok thereafter.
The above was generated using cmdstan cmdstan-2.24.1.
Truncated session info:

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
1 Like

@maj684, Thank you for finding a reproducible example!

Hi Mark,

Do you mean that the model crashes when you run this code or just that the exceptions are reported? If it’s just that those exceptions are reported during the start of the warm-up phase, then there isn’t a cause for concern. This behaviour just indicates that the model doesn’t initialise very well, but that it stabilises without issue. If you continued to see these warnings throughout the sampling stage, then that would indicate a problem with the model.

Cheers,
Andrew

Thanks Andrew. To clarify, the model does not crash. The exceptions are simply reported and sampling continues. I should have been clearer on that. Nevertheless, I am still curious as to how the exception arises and any light could be shed on that would be appreciated.

Always great to hear that there aren’t any crashes! Those exceptions just mean that when the parameters are randomly initialised, some of the thresholds don’t get initialised to ordered numbers. This causes Stan to throw an exception, which rejects these initialised numbers and requests new ones. Then once the model begins sampling, and the threshold values are determined by the sampler, these numbers are in increasing order and no longer rejected, so no more exceptions are thrown.

2 Likes

Thanks Andrew.