Sampling in cmdstan often (but not always) fails with error, while rstan always works

Hi everyone,

I’m trying to fit a model with a custom likelihood function (Skellam distribution).
The model works (compiles, sampling seems to be fine, estimates are ok) in rstan.
In cmdstanr the model compiles, too. But sampling fails occasionally (actually most of the time) with two different error messages, but it also occasionally samples fine (including estimates that look ok).

The error messages I get point to issues in boost:math, but that’s about all I can extract from these messages (reprex below).

Is there anything that can be done to make this model work using cmdstanr?

The reason I’m asking is because I would really like to use cmdstanr, but if need be I can switch to rstan for this project.

library(cmdstanr)
library(rstan)

# data:
set.seed(1)
sdat <- list(N = 50)
theta1 <- 0.5
theta2 <- 2
sdat$diff_y <- rpois(sdat$N, 0.5) - rpois(sdat$N, 2)

scode <- "
functions {
  // from https://github.com/LeoEgidi/footBayes/blob/master/R/stan_foot.R
  real skellam_lpmf(int k, real lambda1, real lambda2) {
    real r = k;
    return -(lambda1 + lambda2) + (r/2) * log(lambda1/lambda2) +
      log(modified_bessel_first_kind(k, 2 * sqrt(lambda1 * lambda2)));
  }
}
data {
  int N;
  array[N] int diff_y; // frequency differential
}
parameters {
  real theta1;
  real theta2;
}
model {
  for (n in 1:N){
    target+= skellam_lpmf(diff_y[n] | exp(theta1), exp(theta2));
  }
  theta1 ~ normal(0,1);
  theta2 ~ normal(0,1);
}
generated quantities {
  real theta1_exp = exp(theta1);
  real theta2_exp = exp(theta2);
}
"

# compile models
mod_cmdstan <- cmdstan_model(write_stan_file(scode))
mod_rstan <- stan_model(write_stan_file(scode))

# sampling fails
mod_cmdstan$sample(data = sdat, chains = 1, iter_warmup = 200, iter_sampling = 200, seed = 4)
# Error in function boost::math::bessel_ik<long double>(long double,long double) in CF2_ik: Series evaluation exceeded 1000000 iterations, giving up now.

# sampling fails
mod_cmdstan$sample(data = sdat, chains = 1, iter_warmup = 200, iter_sampling = 200, seed = 7)
# Error in function boost::math::asymptotic_bessel_i_large_x<long double>(long double,long double): Overflow Error

# sampling is fine (estimates look ok)
mod_cmdstan$sample(data = sdat, chains = 1, iter_warmup = 200, iter_sampling = 200, seed = 3)

# rstan samples without complaints
sampling(mod_rstan, data = sdat, chains = 1, iter = 1000)

Operating System: MacOS
Interface Version: cmdstanr (0.8.1.9000), CmdStan (2.35.0), rstan (2.32.6) with Stan version 2.32.2
Compiler/Toolkit:

1 Like

These 2 errors are common when you using complex functions in boost. If for most time the sampler runs well and you can obtail reasonable posterior then don’t worry about these.

1 Like

Thank you for the response. That’s good to know.

The problem remains that sampling in cmdstanr fails unpredictably, i.e. I got it to work by trying different seeds and by using a low number of iterations. If I increase the number of iterations the probability of these errors goes up, too. And if I use a reasonable number (e.g. 4*1000) it essentially fails in 100% of the attempts.

1 Like

Hi @gobbios, I’m experiencing similar issue. Did you get a chance to understand why it is happening in cmdstanr and not rstan?

No. Sorry. And this dive into the depths of the underlying libraries is way above my head. My take-home here is: use rstan in this specific case.

@gobbios thanks for the quick reply. Unfortunately in my case using rstan does not solve the issue, I still get the same error:

Error in function boost::math::bessel_ik<long double>(long double,long double) in CF2_ik: Series evaluation exceeded 1000000 iterations, giving up now.

However, as it occurs only in some chains during warmup, I was wondering whether I could still use the posterior results of the successful chains or whether this would bias the results. Any idea?

Hmm. That’s interesting. Would you have a reproducible example for the error(s) popping up with rstan?

FWIW, since I don’t understand the cause of the failing sampling, I’d hesitate to assume that it’s okay to discard the failing chains. But maybe worth pinging @lingium again for their input?

However, as it occurs only in some chains during warmup, I was wondering whether I could still use the posterior results of the successful chains or whether this would bias the results.

The reason for this is that during the warm-up, Stan tries some relatively extreme values for positioning, and these values are too extreme when passed to the Bessel function in Boost, so that it cannot converge within the preset number of iterations (as you can see, it is 1,000,000 for that function). However, as MCMC gradually converges, this problem no longer occurs.

If everything is diagnosed as normal after the warm-up, you can definitely use these posteriors.

As for why cmdstan doesn’t work but rstan does, I still have no clue… Maybe providing some initial values for your parameters?

1 Like

It’s probably a Stan math library issue math arose between Stan 2.32 and Stan 2.35 (CmdStanR is more up to date as you can see from versions). @sbronder or @WardBrian may know more as they’ve been looking into how our complex numbers are handled due to changing ground in the standard libraries.

If you can initialize from a sensible place, that can be huge help given what you’ve reported. Sometimes that only requires reducing the scale of random initialization from (-2, 2) to (-0.1, 0.1) or something like that (it samples uniformly on the unconstrained scale after constrained parameters are transformed).

If things are failing after warmup, then there are bigger problems.

exp() is dangerous due to the potential for overflow, but then with the density you’ve defined, I’m not sure how to deal with that. So I’d recommend just coding things this way for simplicity:

parameters {
  vector<lower=0> theta;  // formerly called theta_exp
}
model {
  theta ~ lognormal(0, 1);
}

You then want to vectorize your Skellam lpdf so that you can write:

diff_y ~ skellam(theta[1], theta[2]);

That way, you can compute the normalization term -(lambda1 + lambda2) + (r/2) * log(lambda1/lambda2) just once.

real skellam_lpmf(array[] int k, real lambda1, real lambda2) {
  int N = size(k);
  vector[size] r = to_vector(k);
  real norm = -(lambda1 + lambda2) + (r/2) * log(lambda1/lambda2);
  real lp = N * norm;
  for (n in 1:N) {
      lp += log(modified_bessel_first_kind(k[n], 2 * sqrt(lambda1 * lambda2)));
  }
  return lp;
}

You can then look at where the log modified Bessel is failing and whether it can be stabilized somehow or if there’s a reasonable value to set if it’s about to overflow/underflow (though that usually loses derivatives, so can be a real problem if it persists).

Maybe @bgoodri knows what’s up with the log Bessel function implementations: