I am cross-posting this question from stackexchange as suggested in a comment. The difference between the two results is not purely a question of theoretical interest. In psychophysical experiments, small linearly separated datasets are not uncommon and Bayesian logistic regression is often recommended to deal with with this problem (e.g. Kuss et al. 2005, JOV). Moreover, using
bayesglm with uninformative default priors as a stand-in for
glm has become quite popular. While I expect the results to be influenced by the choice of the priors, the algorithm should not have a large effect. I tried to specify the Stan model (using
stan_glm) to be equivalent to the
bayesglm model with its default priors and obtained different results, which makes me ask whether I misspecified the model or if the results are indeed influenced by the algorithm.
bayesglm function uses the EM algorithm to provide point estimates of the unknown parameter as described in Gelman et al. (2008). It uses the t distribution with 1 dof as priors (also known as Cauchy prior). Continuous predictors are rescaled so that they have a standard deviation of 0.5. The scale of the t distribution prior is set to 10 for intercept and to 2.5 for the rescaled predictors. As a result, the scale for the default prior for a continuous predictor is:
prior.scale = 2.5.
Here is a full example with data that are linearly separated and that yields a warning message when fitted with
library(arm) # bayesglm library(rstanarm) # stan_glm # simulated data set with linear separation dat <- data.frame( x = rep(c(0,1,2.5,4,5),each=4), y = rep(c(0,1),each=10) ) # compute proportions tmp <- aggregate(dat[,"y",drop=FALSE],dat[,"x",drop=FALSE],mean) tmp # x y # 1 0.0 0.0 # 2 1.0 0.0 # 3 2.5 0.5 # 4 4.0 1.0 # 5 5.0 1.0 # fit glm fit.glm <- glm(y ~ x, family=binomial(link="logit"), data=dat) # Warning message: # glm.fit: fitted probabilities numerically 0 or 1 occurred coef(fit.glm) # (Intercept) x # -33.93562 13.57425 # bayes glm fit.bglm <- bayesglm(y ~ x, family=binomial(link="logit"), data=dat) coef(fit.bglm) # (Intercept) x # -4.756769 1.902708
Additional information on the priors used by
# prior tmp <- do.call("rbind", fit.bglm[paste0("prior.",c("mean","scale","df","sd"))]) colnames(tmp) <- c("(Intercept)","x") tmp # Intercept x # prior.mean 0.000000 0.0000000 # prior.scale 10.000000 0.6607427 # prior.df 1.000000 1.0000000 # prior.sd 7.962467 1.5195179
It is easy to verify the scale of the predictor for the continuous predictor
lambda <- 2.5/(2*sd(dat$x)) lambda #  0.6607427
stan_glm from the
rstanarm package uses Markov Chain Monte-Carlo methods to obtain the full posterior distribution of the parameters.
fit.sglm <- stan_glm(y ~ x, data = dat, family = binomial(link = "logit"), prior = student_t(df = 1, 0, lambda), prior_intercept = student_t(1, 0, 10), cores = 2, seed = 12345) prior_summary(fit.sglm) # Intercept (after predictors centered) # ~ cauchy(location = 0, scale = 10) # Coefficients # ~ cauchy(location = 0, scale = 0.66) print(fit.sglm,digits=5) # Median MAD_SD # (Intercept) -9.407 7.034 # x 3.734 2.731 coef(fit.sglm) # (Intercept) x # -9.407408 3.734188
The coefficients corresponds to the median of the posterior distribution. It is also possible to obtain the mean of the posterior distribution
bayestestR::point_estimate(fit.sglm, centrality = "mean")$Mean #  -21.928522 8.766372
The results are very different from these obtained with
bayesglm. I selected this example with a small linearly separated data set to emphasize this differences. I expect to obtain slightly different values due to the difference in underlying algorithms but I would not expected such a large difference if the priors are the same and the point estimates are computed correctly. I have tried also to rescale the intercept but it did not help. I would like to know how to reproduce the results of
stan_glm and, if it is not possible, why.