I am cross-posting this question from stackexchange as suggested in a comment. The difference between the two results is not purely a question of theoretical interest. In psychophysical experiments, small linearly separated datasets are not uncommon and Bayesian logistic regression is often recommended to deal with with this problem (e.g. Kuss et al. 2005, JOV). Moreover, using bayesglm
with uninformative default priors as a stand-in for glm
has become quite popular. While I expect the results to be influenced by the choice of the priors, the algorithm should not have a large effect. I tried to specify the Stan model (using stan_glm
) to be equivalent to the bayesglm
model with its default priors and obtained different results, which makes me ask whether I misspecified the model or if the results are indeed influenced by the algorithm.
The bayesglm
function uses the EM algorithm to provide point estimates of the unknown parameter as described in Gelman et al. (2008). It uses the t distribution with 1 dof as priors (also known as Cauchy prior). Continuous predictors are rescaled so that they have a standard deviation of 0.5. The scale of the t distribution prior is set to 10 for intercept and to 2.5 for the rescaled predictors. As a result, the scale for the default prior for a continuous predictor is:
with prior.scale
= 2.5.
Here is a full example with data that are linearly separated and that yields a warning message when fitted with glm
.
library(arm) # bayesglm
library(rstanarm) # stan_glm
# simulated data set with linear separation
dat <- data.frame(
x = rep(c(0,1,2.5,4,5),each=4),
y = rep(c(0,1),each=10)
)
# compute proportions
tmp <- aggregate(dat[,"y",drop=FALSE],dat[,"x",drop=FALSE],mean)
tmp
# x y
# 1 0.0 0.0
# 2 1.0 0.0
# 3 2.5 0.5
# 4 4.0 1.0
# 5 5.0 1.0
# fit glm
fit.glm <- glm(y ~ x, family=binomial(link="logit"), data=dat)
# Warning message:
# glm.fit: fitted probabilities numerically 0 or 1 occurred
coef(fit.glm)
# (Intercept) x
# -33.93562 13.57425
# bayes glm
fit.bglm <- bayesglm(y ~ x, family=binomial(link="logit"), data=dat)
coef(fit.bglm)
# (Intercept) x
# -4.756769 1.902708
Additional information on the priors used by bayesglm
:
# prior
tmp <- do.call("rbind",
fit.bglm[paste0("prior.",c("mean","scale","df","sd"))])
colnames(tmp) <- c("(Intercept)","x")
tmp
# Intercept x
# prior.mean 0.000000 0.0000000
# prior.scale 10.000000 0.6607427
# prior.df 1.000000 1.0000000
# prior.sd 7.962467 1.5195179
It is easy to verify the scale of the predictor for the continuous predictor
lambda <- 2.5/(2*sd(dat$x))
lambda
# [1] 0.6607427
The function stan_glm
from the rstanarm
package uses Markov Chain Monte-Carlo methods to obtain the full posterior distribution of the parameters.
fit.sglm <- stan_glm(y ~ x, data = dat,
family = binomial(link = "logit"),
prior = student_t(df = 1, 0, lambda),
prior_intercept = student_t(1, 0, 10),
cores = 2, seed = 12345)
prior_summary(fit.sglm)
# Intercept (after predictors centered)
# ~ cauchy(location = 0, scale = 10)
# Coefficients
# ~ cauchy(location = 0, scale = 0.66)
print(fit.sglm,digits=5)
# Median MAD_SD
# (Intercept) -9.407 7.034
# x 3.734 2.731
coef(fit.sglm)
# (Intercept) x
# -9.407408 3.734188
The coefficients corresponds to the median of the posterior distribution. It is also possible to obtain the mean of the posterior distribution
bayestestR::point_estimate(fit.sglm, centrality = "mean")$Mean
# [1] -21.928522 8.766372
The results are very different from these obtained with bayesglm
. I selected this example with a small linearly separated data set to emphasize this differences. I expect to obtain slightly different values due to the difference in underlying algorithms but I would not expected such a large difference if the priors are the same and the point estimates are computed correctly. I have tried also to rescale the intercept but it did not help. I would like to know how to reproduce the results of bayeglm
with stan_glm
and, if it is not possible, why.