Optimize method of cmdstanr occasionally fails with warning message "Fitting finished unexpectedly!"

Questions like this annoy me, so apologies in advance for asking it.

I’ve been using cmdstanr since (I think) late May to take advantage of reduce_sum() in cmdstan, and have used it without error on thousands of data sets. Sometime in the past week it started throwing errors – initially these were coming at the end of sampling after what appeared to be normal termination. I updated cmdstanr to the current release, updated cmdstan to 2.25.0. I also ran some hardware diagnostics and found no problems. Now I’m having occasional problems with the optimize method in cmdstanr (contrary to advice I’ve seen here it really helps in the model I’m running to initialize with the approximate MAP solution with some added random jitter).

When it fails I get the warning message in the headline. Trying to access the mle() will then throw an error. Running optimize() will eventually succeed, so the workaround I came up with was to run it inside tryCatch() inside a repeat block. Here is a stupidly simple Stan model:

data {
  int<lower=1> N;
  vector[N] y;
  vector[N] x;
}
parameters {
  real<lower=0.> sigma;
  real a;
  real b;
}
model {
  sigma ~ normal(1., 10.);
  a ~ normal(0., 10.);
  b ~ normal(0., 10.);
  
  y ~ normal(a + b*x, sigma);
}

and a little R function to generate some fake data:

make_data <- function(N=10000, seed=123456L) {
  set.seed(seed)
  x <- runif(N, min= -100, max=100)
  y = 1. + x + rnorm(N, 0., 1.)
  list(N=N, x=x, y=y)
}

Here’s the output from a failed call to optimize():

 opt <- cmstanmodel$optimize(data=stan_dat,init=list(inits))
Initial log joint probability = -2.68817e+06 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
Error evaluating model log probability: Non-finite gradient. 
Error evaluating model log probability: Non-finite gradient. 
Error evaluating model log probability: Non-finite gradient. 
Error evaluating model log probability: Non-finite gradient. 
Warning: Fitting finished unexpectedly!

> opt$output()

method = optimize
  optimize
    algorithm = lbfgs (Default)
      lbfgs
        init_alpha = 0.001 (Default)
        tol_obj = 9.9999999999999998e-13 (Default)
        tol_rel_obj = 10000 (Default)
        tol_grad = 1e-08 (Default)
        tol_rel_grad = 10000000 (Default)
        tol_param = 1e-08 (Default)
        history_size = 5 (Default)
    iter = 2000 (Default)
    save_iterations = 0 (Default)
id = 1
data
  file = C:/Users/mlpec/AppData/Local/Temp/RtmpGoFdIA/standata-1f88526820a3.json
init = C:/Users/mlpec/AppData/Local/Temp/RtmpGoFdIA/init-1-1f887a0f64ae.json
random
  seed = 2015142350
output
  file = C:/Users/mlpec/AppData/Local/Temp/RtmpGoFdIA/simple_regression-202101070936-1-77a671.csv
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = -1 (Default)
num_threads = 1

> opt$output_files()
character(0)

note: the intended output file does exist in the temp directory and contains the correct solution.

And here’s a successful call with the same init:

opt <- cmstanmodel$optimize(data=stan_dat,init=list(inits))
Initial log joint probability = -2.68817e+06 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
Error evaluating model log probability: Non-finite gradient. 
Error evaluating model log probability: Non-finite gradient. 
Error evaluating model log probability: Non-finite gradient. 
Error evaluating model log probability: Non-finite gradient. 
      28       -4997.2   0.000107876      0.792301      0.5193           1       60    
Optimization terminated normally:  
  Convergence detected: relative gradient magnitude is below tolerance 
Finished in  0.2 seconds.
> opt$output_files()
[1] "C:/Users/mlpec/AppData/Local/Temp/RtmpGoFdIA/simple_regression-202101070941-1-920987.csv"
> opt$output()

One other detail: I created as close a version match as possible of the entire tool set on a PC running Ubuntu and encountered no errors at all on multiple data sets, so whatever is going wrong is unique to this one Windows PC.

Finally, here is part of the output of Sys.getenv() and sessionInfo():

LOCAL_CPPFLAGS                                  -march=core2
LOCALAPPDATA                                    C:\Users\mlpec\AppData\Local
LOGONSERVER                                     \\OLLIE
MPLENGINE                                       tkAgg
MSYS2_ENV_CONV_EXCL                             R_ARCH
NUMBER_OF_PROCESSORS                            32
NVCUDASAMPLES_ROOT                              C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.0
NVCUDASAMPLES10_0_ROOT                          C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.0
NVTOOLSEXT_PATH                                 C:\Program Files\NVIDIA Corporation\NvToolsExt\
OS                                              Windows_NT
PATH                                            D:/mlpeck/cmdstan/stan/lib/stan_math/lib/tbb;D:\mlpeck\cmdstan\stan\lib\stan_math\lib\tbb;C:\Program
                                                Files\R\R-4.0.0\bin\x64;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin;C:\Program
                                                Files\NVIDIA GPU Computing
                                                Toolkit\CUDA\v10.0\libnvvp;C:\rtools40\usr\bin;C:\rtools40\mingw64\bin;C:\ProgramData\Oracle\Java\javapath_target_438593;C:\Windows\System32;C:\Windows;C:\Windows\System32\wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Program
                                                Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program
                                                Files\R\R-4.0.0\bin;C:\Windows\System32;C:\Windows;C:\Windows\System32\wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Windows\System32;C:\Windows;C:\Windows\System32\wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program
                                                Files\NVIDIA Corporation\NVIDIA
                                                NvDLISR;D:\mlpeck\cmdstan\stan\lib\stan_math\lib\tbb;C:\Users\mlpec\AppData\Local\Microsoft\WindowsApps
PATHEXT                                         .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
PKG_CPPFLAGS                                    -I"D:/mlpeck/R/win-library/4.0/Rcpp/include/" -I"D:/mlpeck/R/win-library/4.0/RcppEigen/include/"
                                                -I"D:/mlpeck/R/win-library/4.0/RcppEigen/include/unsupported"
                                                -I"D:/mlpeck/R/win-library/4.0/BH/include"
                                                -I"D:/mlpeck/R/win-library/4.0/StanHeaders/include/src/"
                                                -I"D:/mlpeck/R/win-library/4.0/StanHeaders/include/"
                                                -I"D:/mlpeck/R/win-library/4.0/rstan/include" -DEIGEN_NO_DEBUG -D_REENTRANT
                                                -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -include
                                                stan/math/prim/mat/fun/Eigen.hpp -std=c++1y
PKG_LIBS                                        -L"D:/mlpeck/R/win-library/4.0/StanHeaders/libs/x64" -lStanHeaders
PROCESSOR_ARCHITECTURE                          AMD64
PROCESSOR_IDENTIFIER                            Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
PROCESSOR_LEVEL                                 6
PROCESSOR_REVISION                              5504
ProgramData                                     C:\ProgramData
ProgramFiles                                    C:\Program Files
ProgramFiles(x86)                               C:\Program Files (x86)
ProgramW6432                                    C:\Program Files
PSModulePath                                    C:\Program Files\WindowsPowerShell\Modules;C:\WINDOWS\system32\WindowsPowerShell\v1.0\Modules
PUBLIC                                          C:\Users\Public
QT_D3DCREATE_MULTITHREADED                      1
R_ARCH                                          /x64
R_COMPILED_BY                                   gcc 8.3.0
R_DOC_DIR                                       C:/PROGRA~1/R/R-40~1.0/doc
R_HOME                                          C:/PROGRA~1/R/R-40~1.0
R_PACKRAT_SYSTEM_LIBRARY                        C:/PROGRA~1/R/R-40~1.0/library
RMARKDOWN_MATHJAX_PATH                          C:/Program Files/RStudio/resources/mathjax-26
RS_LOCAL_PEER                                   \\.\pipe\15659-rsession
RS_RPOSTBACK_PATH                               C:/Program Files/RStudio/bin/rpostback
RS_SHARED_SECRET                                63341846741
RSTUDIO                                         1
RSTUDIO_CONSOLE_COLOR                           256
RSTUDIO_CONSOLE_WIDTH                           80
RSTUDIO_MSYS_SSH                                C:/Program Files/RStudio/bin/msys-ssh-1000-18
RSTUDIO_PANDOC                                  C:/Program Files/RStudio/bin/pandoc
RSTUDIO_SESSION_PORT                            15659
RSTUDIO_USER_IDENTITY                           mlpec
RSTUDIO_WINUTILS                                C:/Program Files/RStudio/bin/winutils
RTOOLS40_HOME                                   C:\rtools40
SESSIONNAME                                     Console
SystemDrive                                     C:
SystemRoot                                      C:\WINDOWS

sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstan_2.19.3       ggplot2_3.3.0      StanHeaders_2.19.2 cmdstanr_0.3.0     tibble_3.0.1       readr_1.3.1       

loaded via a namespace (and not attached):
 [1] tidyselect_1.0.0   xfun_0.13          purrr_0.3.4        colorspace_1.4-1   vctrs_0.2.4        stats4_4.0.0       loo_2.2.0          utf8_1.1.4        
 [9] rlang_0.4.9        pkgbuild_1.0.7     pillar_1.4.3       glue_1.4.0         withr_2.2.0        matrixStats_0.56.0 lifecycle_0.2.0    posterior_0.1.3   
[17] munsell_0.5.0      gtable_0.3.0       labeling_0.3       inline_0.3.15      knitr_1.28         callr_3.4.3        ps_1.3.2           parallel_4.0.0    
[25] fansi_0.4.1        Rcpp_1.0.4.6       scales_1.1.0       backports_1.1.6    checkmate_2.0.0    jsonlite_1.6.1     abind_1.4-5        farver_2.0.3      
[33] gridExtra_2.3      digest_0.6.25      hms_0.5.3          packrat_0.5.0      processx_3.4.5     dplyr_0.8.5        grid_4.0.0         cli_2.0.2         
[41] tools_4.0.0        magrittr_1.5       crayon_1.3.4       pkgconfig_2.0.3    ellipsis_0.3.0     data.table_1.12.8  prettyunits_1.1.1  assertthat_0.2.1  
[49] rstudioapi_0.11    R6_2.4.1           compiler_4.0.0    
> 

Hi Micheal,

in a failed run, does a file exist if you run with

opt$output_files(include_failed = TRUE)

If yes, can you post the output in the CSV file. Thanks!

Yes, here it is:

 opt <- cmstanmodel$optimize(data=stan_dat,init=list(inits))
Initial log joint probability = -2.68817e+06 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
Error evaluating model log probability: Non-finite gradient. 
Error evaluating model log probability: Non-finite gradient. 
Error evaluating model log probability: Non-finite gradient. 
Error evaluating model log probability: Non-finite gradient. 
      28       -4997.2   0.000107876      0.792301      0.5193           1       60    
Optimization terminated normally:  
  Convergence detected: relative gradient magnitude is below tolerance 
Finished in  0.2 seconds.
> opt <- cmstanmodel$optimize(data=stan_dat,init=list(inits))
Initial log joint probability = -2.68817e+06 
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes  
Error evaluating model log probability: Non-finite gradient. 
Error evaluating model log probability: Non-finite gradient. 
Error evaluating model log probability: Non-finite gradient. 
Error evaluating model log probability: Non-finite gradient. 
Warning: Fitting finished unexpectedly!

> opt$output_files(include_failed=TRUE)
[1] "C:/Users/mlpec/AppData/Local/Temp/RtmpKsv487/simple_regression-202101071942-1-438ee0.csv"
> writeLines(readLines(opt$output_files(include_failed = TRUE)))
# stan_version_major = 2
# stan_version_minor = 25
# stan_version_patch = 0
# model = simple_regression_model
# method = optimize
#   optimize
#     algorithm = lbfgs (Default)
#       lbfgs
#         init_alpha = 0.001 (Default)
#         tol_obj = 9.9999999999999998e-13 (Default)
#         tol_rel_obj = 10000 (Default)
#         tol_grad = 1e-08 (Default)
#         tol_rel_grad = 10000000 (Default)
#         tol_param = 1e-08 (Default)
#         history_size = 5 (Default)
#     iter = 2000 (Default)
#     save_iterations = 0 (Default)
# id = 1
# data
#   file = C:/Users/mlpec/AppData/Local/Temp/RtmpKsv487/standata-19f440e075d7.json
# init = C:/Users/mlpec/AppData/Local/Temp/RtmpKsv487/init-1-19f4a351efb.json
# random
#   seed = 1231345697
# output
#   file = C:/Users/mlpec/AppData/Local/Temp/RtmpKsv487/simple_regression-202101071942-1-438ee0.csv
#   diagnostic_file =  (Default)
#   refresh = 100 (Default)
#   sig_figs = -1 (Default)
# num_threads = 1
lp__,sigma,a,b
-4997.2,0.9997,1.01294,1.00015
> 

This reported finishing normally multiple times before failing with a warning. The actual output file exists and has the expected solution.

By the way sampling works fine and returns results consistent with lm(). No surprise for a nearly trivial model with well behaved data.

Thanks for responding!