Maximum likelihood via rstanarm::stan_glm

Statement of problem

I’m trying to evaluate the statement at the top of page 6 of the rstanarm documentation which states, with regard to the choice of algorithm, that “if there is no prior information, then [algorithm=“optimizing”] is equivalent to maximum likelihood…”, but I’m having trouble verifying this.

Main question

My question is: Having set the priors to be “flat” by making them NULL, I would expect the parameter estimates to be the same as the MLE, but that is not the case in my example below (compare fit1 to mean(y) below)

Other details

  1. For comparison, I also include a basic location model inrstan (fit2 below), for which I was successful in recapitulating the MLE.
  2. I set the convergence criteria to be minuscule to ensure that this was not premature convergence.
  3. stan_lm returns an error upon setting algorithm="optimizing" (Error in match.arg(algorithm) : 'arg' should be one of “sampling”, “meanfield”, “fullrank”), which is why I resorted to using stan_glm with an identity link.
  4. I also tried setting initial values in stan_glm but wasn’t successful in that regard. The documentation is a little sparse in this regard.

Code

Here is the code:

library(rstanarm)
library(rstan)

set.seed(1)
y <- rnorm(10)

conv_crit =  .Machine$double.eps
iter = 2000

fit1 <- stan_glm(y ~ 1,  
                 prior = NULL, 
                 prior_intercept = NULL, 
                 prior_aux = NULL,
                 family = "gaussian",
                 algorithm = "optimizing", 
                 seed = 1, 
                 refresh = 1,
                 verbose = T,
                 iter = iter,
                 tol_obj = conv_crit,
                 tol_rel_obj = conv_crit / .Machine$double.eps,
                 tol_grad = conv_crit,
                 tol_rel_grad = conv_crit / .Machine$double.eps,
                 tol_param = conv_crit)


stancode <- 'data {int<lower=1> n; real y[n];} parameters {real y_mean; real<lower = 0> sigma;} model {y ~ normal(y_mean, sigma);}'
mod <- stan_model(model_code = stancode, verbose = TRUE)
fit2 <- optimizing(mod, 
                   data = list(n = length(y), y = y),
                   seed = 1, 
                   refresh = 1,
                   verbose = T,
                   tol_obj = conv_crit,
                   tol_rel_obj = conv_crit / .Machine$double.eps,
                   tol_grad = conv_crit,
                   tol_rel_grad = conv_crit / .Machine$double.eps,
                   tol_param = conv_crit)
coef(fit1) #stan_glm
#(Intercept) 
# 0.1117049 
mean(y) #mle
# [1] 0.1322028
fit2$par[["y_mean"]] # rstan optimization
# [1] 0.1322028

Session info

> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.6.8

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstan_2.21.5         ggplot2_3.3.6        StanHeaders_2.21.0-7
[4] rstanarm_2.21.3      Rcpp_1.0.8.3        

Solution: you can get the actual MAP (which equals MLE) by setting draws=0 in the call to stan_glm.

If the draws argument is not specified, then stan_glm.fit runs optimizing_args$draws <- 1000L. Later, the stanreg function calculates the median of these 1000 draws and uses this median as the reported estimate. Thus, when draws > 0, which is the default behavior of stan_glm with algorithm = "optimizing", you do not actually get the MAP returned. OTOH, if you set draws=0, you bypass sampling from the posterior entirely and get exactly the MLE as with the other approaches. I don’t think this is the intended behavior of stan_glm with algorithm="optimizing".

1 Like