Chain finished unexpectedly when using brms on a cluster

Short summary of the problem

I am trying to fit some fairly standard logistic regression models using BRMS on a cluster, but in almost all cases (aside from one very simple model which I will paste below), the fitting seems to fail with little to no indication as to why. The primary model I am trying to run is:

priors <- prior(horseshoe(1), nlpar="a")
model_occ <- bf(
    outcome ~ a + splines, nl=TRUE
  ) + lf(
    a ~ sex + isdiabetes_clean + respiratory_and_asthma_clean + 
      chronicheart_clean + chronicrenal_clean + chronicneurological_clean +
      immunosuppressiondisease_clean + hypertension_clean +
      CriticalCareOcc + (1|ethnicity6) + (1|obese) + week_hosp + chronicliver_clean
  ) + lf(
    splines ~ 0 + s(ageyear, k=4, bs="cr")
  )

fit_occ <- brm(model_occ,
    data = df_primary_hdu[1:100,], 
    prior = priors,
    family = brms::bernoulli(link = "logit"),
    control = list(adapt_delta = 0.95),
    iter=4000,
    knots = list(
      ageyear = as.vector(quantile(df_primary_hdu$ageyear, probs=c(0.05, 0.35, 0.65, 0.95)))
    ),
    chains=5, 
    cores=5,
    seed = 12345,
    backend="cmdstanr"
)

But I always get the following errors / warnings:

Compiling Stan program...
Start sampling
Running MCMC with 5 parallel chains...

Warning: Chain 1 finished unexpectedly!

Warning: Chain 2 finished unexpectedly!

Warning: Chain 3 finished unexpectedly!

Warning: Chain 4 finished unexpectedly!

Warning: Chain 5 finished unexpectedly!

Warning: Use read_cmdstan_csv() to read the results of the failed chains.
Error in rstan::read_stan_csv(out$output_files()) :
  csvfiles does not contain any CSV file name
In addition: Warning messages:
1: All chains finished unexpectedly!

2: No chains finished successfully. Unable to retrieve the fit.

A more simple model such as the following works fine though:

priors <- prior(horseshoe(1))
model_occ_min <- bf(
    outcome ~ CriticalCareOcc
  )
fit_occ_min <- brm(
  model_occ_min,
  data = df_primary, 
  prior = priors,
  family = brms::bernoulli(link = "logit"),
  control = list(adapt_delta = 0.95),
  iter=4000,
  chains=4, 
  cores=4,
  seed = 12345,
  backend="cmdstanr"
)

These models work fine on Mac OS with the same data, so not sure why they would fail here. An example model such as that from a vignette I found here also works Estimating Monotonic Effects with brms • brms which leads me to be confused as to what the problem might be, open to any suggestions of things to try to try and narrow this problem down. Thank you.

If possible, add also code to simulate data or attach a (subset of) the dataset you work with.

Unfortunately I cannot as it is sensitive data, hence why I am working on a remote cluster

Please also provide the following information in addition to your question:

  • Operating System: Red Hat Enterprise Linux 7.8
  • brms Version: brms_2.14.4

Hi,

can you try running

library(cmdstanr)
fit <- cmdstanr_example(chains = 1)

Does this complete?

Sorry to revive this, but I too have models that (ex-Gaussian with distributional effects), when run locally, run fine, but when run on a (windows) server, some chains “fail unexpectedly”. Not all of them though, and the behaviour seems fairly erratic.

The code you mentioned works fine:

>  parallel::detectCores(logical = FALSE)
[1] 36
> library(cmdstanr)
This is cmdstanr version 0.4.0
- Online documentation and vignettes at mc-stan.org/cmdstanr
- CmdStan path set to: C:/Users/dmakowski/Documents/.cmdstanr/cmdstan-2.27.0
- Use set_cmdstan_path() to change the path
> fit <- cmdstanr_example(chains = 1)
Compiling Stan program...
 
> fit
   variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
 lp__       -66.03 -65.69 1.42 1.26 -68.80 -64.30 1.00      488      634
 alpha        0.37   0.37 0.23 0.22   0.02   0.76 1.01     1124      739
 beta[1]     -0.66  -0.65 0.26 0.24  -1.12  -0.24 1.00      825      523
 beta[2]     -0.27  -0.27 0.22 0.23  -0.64   0.08 1.00     1262      786
 beta[3]      0.69   0.68 0.28 0.29   0.23   1.15 1.00      810      786
 log_lik[1]  -0.52  -0.51 0.10 0.10  -0.69  -0.37 1.00     1007      711
 log_lik[2]  -0.40  -0.38 0.15 0.15  -0.66  -0.19 1.00      917      692
 log_lik[3]  -0.50  -0.47 0.22 0.21  -0.90  -0.20 1.00     1132      787
 log_lik[4]  -0.45  -0.43 0.15 0.15  -0.72  -0.24 1.00      908      691
 log_lik[5]  -1.19  -1.16 0.30 0.31  -1.70  -0.74 1.00     1065      762

 # showing 10 of 105 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

Versions:

> packageVersion("brms")
[1] ‘2.15.9’
> packageVersion("cmdstanr")
[1] ‘0.4.0’

I am having the same problem on HPC cluster with Linux. I am using cmdstanr version 0.5.3
and CmdStan version: 2.30.1

Running MCMC with 4 parallel chains...

Warning: Chain 3 finished unexpectedly!

Warning: Chain 2 finished unexpectedly!

Warning: Chain 1 finished unexpectedly!

Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
.......

Please only make one post/topic at a time when you need help