Unsure about meaning of return code during rstan::optimizing

A common practice in the field I’m in is to do something like MLE on the (0.1, 0.3, 0.5, 0.7, 0.9) quantiles of the data (looking at response times for humans in very simple decision making tasks). I was looking to informally compare that practice with a few alternatives, but I’ve rapidly hit the limits knowledge and am hoping for suggestions. The model might look something like:

data{
  int n;
  vector[n+2] quantiles;
  real dev_full;
}
parameters {
  real<lower=0> shape;
  real<lower=0> rate;
}
model {
  row_vector[n+1] predictions; 
  real gsquared;
  
  for(q in 1:(n+1)){
    predictions[q] = gamma_cdf(quantiles[q+1], shape, rate) - gamma_cdf(quantiles[q], shape, rate);
  }
  print("predictions: ", predictions);
  print("shape: ", shape);
  print("rate: ", rate);

  gsquared = 2*sum([20, 40, 40, 40, 40, 20] .* (log([20, 40, 40, 40, 40, 20]) - log(predictions*200)));
  print("gsquared: ", gsquared);
  print("target: ", target());
  target += (gsquared + dev_full)*-0.5;
  print("target: ", target());
}

Calling that with, e.g.,

optimizing(gamma_cdf, 
           data=list(n=5, quantiles=c(0,0.9830104, 1.2614561, 1.4274131, 1.6010226, 1.9755072,Inf), 
                     dev_full = dmultinom(c(20, 40, 40, 40, 40, 20), prob = c(.1,.2,.2,.2,.2,.1), log=TRUE) * -2), 
           init = list(shape=9, rate=9),
           algorithm = "LBFGS"
)

gives an a result indicating that the parameters have moved away from the initial values

$par
    shape      rate 
12.028364  8.262126 

$value
[1] -15.41155

$return_code
[1] 70

but also that there’s a nonzero return code (and that target is higher than expected). My questions are

  1. Is it expected that nothing prints when optimizing? (values do print when sampling)
  2. Does that return code point to a specific problem?

Setting init_alpha = 1e-15 produces something almost identical. Setting all 5 of the tol_* parameters are set to 0 also doesn’t change much.

Thanks,


R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstan_2.18.1       StanHeaders_2.18.0 ggplot2_3.0.0     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.19       bindr_0.1.1        compiler_3.5.1     pillar_1.3.0       plyr_1.8.4         prettyunits_1.0.2  base64enc_0.1-3    tools_3.5.1       
 [9] digest_0.6.18      packrat_0.4.9-3    pkgbuild_1.0.2     evaluate_0.12      tibble_1.4.2       gtable_0.2.0       debugme_1.1.0      pkgconfig_2.0.2   
[17] rlang_0.3.0        cli_1.0.1          rstudioapi_0.8     parallel_3.5.1     yaml_2.2.0         xfun_0.4           loo_2.0.0          bindrcpp_0.2.2    
[25] gridExtra_2.3      withr_2.1.2        dplyr_0.7.7        knitr_1.20         tidyselect_0.2.5   stats4_3.5.1       rprojroot_1.3-2    grid_3.5.1        
[33] glue_1.3.0         inline_0.3.15      R6_2.3.0           processx_3.2.0     rmarkdown_1.10     bookdown_0.7       purrr_0.2.5        callr_3.0.0       
[41] magrittr_1.5       codetools_0.2-15   matrixStats_0.54.0 backports_1.1.2    scales_1.0.0       ps_1.2.0           htmltools_0.3.6    assertthat_0.2.0  
[49] colorspace_1.3-2   lazyeval_0.2.1     munsell_0.5.0      crayon_1.3.4 

I think you have to flip on verbose = TRUE when optimizing to see a lot of stuff. Also, I vaguely remember making the return codes match what R does with L-BFGS-B and / or what the original LBFGS implementation did.

Ah, sorry for not reading more closely. Yeah, all the prints get through when verbose = TRUE, as well as other info. Using random initialization ( e.g., init = list(shape=runif(1,0,2), rate=runif(1,0,2)) ) now tends to give outputs like (minus outputs of those print statements)

Chain 1: Initial log joint probability = -423.807
Chain 1: Exception: Error in function boost::math::tgamma<double>(double): numeric overflow  (in 'modelac622523f95_gamma_cdf_quantiles' at line 15)

Chain 1:        9      -20.0898     0.0225205       17.2143   0.0002923       0.001       55  LS failed, Hessian reset 
Chain 1:       10      -19.2069     0.0964281       76.1585    0.005602       0.001       96  LS failed, Hessian reset 
Chain 1: Exception: Error in function boost::math::tgamma<double>(double): numeric overflow  (in 'modelac622523f95_gamma_cdf_quantiles' at line 15)

Chain 1:       11      -18.4438     0.0205033       15.0487   0.0002692       0.001      129  LS failed, Hessian reset 
Chain 1:       12      -18.2231     0.0150487       16.7178       0.001       0.001      168  LS failed, Hessian reset 
Chain 1:       15      -14.5097    0.00293584       19.4607   0.0001179       0.001      202  LS failed, Hessian reset 
Chain 1:       16      -14.5097    0.00293584       19.4607       1e-12       0.001      251  LS failed, Hessian reset 
$par
    shape      rate 
13.143719  8.998486 

$value
[1] -14.50975

$return_code
[1] 70

I’m not sure that the return code of 70 could match R’s stats::optim, the docs say that there’ll be a (failed) convergence code of 51 or 52, but also that those codes are accompanied by a character string message.

This did bring up though, that stats::optim itself seems pretty okay at finding a solution without complaints

gsqfun <- function(params, quantiles, observed_frequencies=c(20, 40, 40, 40, 40, 20)){
  predictions <- vector(mode = "numeric", length = length(observed_frequencies))
  params <- exp(params)
  for(q in 1:(length(observed_frequencies))){
    predictions[q] <- pgamma(quantiles[q+1],shape=params[1], rate=params[2]) -
      pgamma(quantiles[q],shape=params[1], rate=params[2])
  }
  predicted_frequencies <- predictions*sum(observed_frequencies)
  gsq <- 2*sum(observed_frequencies*log(observed_frequencies/predicted_frequencies))
  return(gsq)
}

optim(par = c(runif(2,0,2)), fn = gsqfun, 
      quantiles=c(0, 0.9830104, 1.2614561, 1.4274131, 1.6010226, 1.9755072, Inf),
      method = "BFGS")

(at least, most of the time without complaint. Evaluation of pgamma will result in NaNs, but skipping exponentiation of the params and using method = L-BFGS-B, lower=c(0,0) seems to avoid that).

The values are consistently recovered, so for simplicity, I’ll probably just stick with optim.

Thanks again