I am cross-posting this question from stackexchange as suggested in a comment. The difference between the two results is not purely a question of theoretical interest. In psychophysical experiments, small linearly separated datasets are not uncommon and Bayesian logistic regression is often recommended to deal with with this problem (e.g. Kuss et al. 2005, JOV). Moreover, using `bayesglm`

with uninformative default priors as a stand-in for `glm`

has become quite popular. While I expect the results to be influenced by the choice of the priors, the algorithm should not have a large effect. I tried to specify the Stan model (using `stan_glm`

) to be equivalent to the `bayesglm`

model with its default priors and obtained different results, which makes me ask whether I misspecified the model or if the results are indeed influenced by the algorithm.

The `bayesglm`

function uses the EM algorithm to provide point estimates of the unknown parameter as described in Gelman et al. (2008). It uses the t distribution with 1 dof as priors (also known as Cauchy prior). Continuous predictors are rescaled so that they have a standard deviation of 0.5. The scale of the t distribution prior is set to 10 for intercept and to 2.5 for the rescaled predictors. As a result, the scale for the default prior for a continuous predictor is:

with `prior.scale`

= 2.5.

Here is a full example with data that are linearly separated and that yields a warning message when fitted with `glm`

.

```
library(arm) # bayesglm
library(rstanarm) # stan_glm
# simulated data set with linear separation
dat <- data.frame(
x = rep(c(0,1,2.5,4,5),each=4),
y = rep(c(0,1),each=10)
)
# compute proportions
tmp <- aggregate(dat[,"y",drop=FALSE],dat[,"x",drop=FALSE],mean)
tmp
# x y
# 1 0.0 0.0
# 2 1.0 0.0
# 3 2.5 0.5
# 4 4.0 1.0
# 5 5.0 1.0
# fit glm
fit.glm <- glm(y ~ x, family=binomial(link="logit"), data=dat)
# Warning message:
# glm.fit: fitted probabilities numerically 0 or 1 occurred
coef(fit.glm)
# (Intercept) x
# -33.93562 13.57425
# bayes glm
fit.bglm <- bayesglm(y ~ x, family=binomial(link="logit"), data=dat)
coef(fit.bglm)
# (Intercept) x
# -4.756769 1.902708
```

Additional information on the priors used by `bayesglm`

:

```
# prior
tmp <- do.call("rbind",
fit.bglm[paste0("prior.",c("mean","scale","df","sd"))])
colnames(tmp) <- c("(Intercept)","x")
tmp
# Intercept x
# prior.mean 0.000000 0.0000000
# prior.scale 10.000000 0.6607427
# prior.df 1.000000 1.0000000
# prior.sd 7.962467 1.5195179
```

It is easy to verify the scale of the predictor for the continuous predictor

```
lambda <- 2.5/(2*sd(dat$x))
lambda
# [1] 0.6607427
```

The function `stan_glm`

from the `rstanarm`

package uses Markov Chain Monte-Carlo methods to obtain the full posterior distribution of the parameters.

```
fit.sglm <- stan_glm(y ~ x, data = dat,
family = binomial(link = "logit"),
prior = student_t(df = 1, 0, lambda),
prior_intercept = student_t(1, 0, 10),
cores = 2, seed = 12345)
prior_summary(fit.sglm)
# Intercept (after predictors centered)
# ~ cauchy(location = 0, scale = 10)
# Coefficients
# ~ cauchy(location = 0, scale = 0.66)
print(fit.sglm,digits=5)
# Median MAD_SD
# (Intercept) -9.407 7.034
# x 3.734 2.731
coef(fit.sglm)
# (Intercept) x
# -9.407408 3.734188
```

The coefficients corresponds to the median of the posterior distribution. It is also possible to obtain the mean of the posterior distribution

```
bayestestR::point_estimate(fit.sglm, centrality = "mean")$Mean
# [1] -21.928522 8.766372
```

The results are very different from these obtained with `bayesglm`

. I selected this example with a small linearly separated data set to emphasize this differences. I expect to obtain slightly different values due to the difference in underlying algorithms but I would not expected such a large difference if the priors are the same and the point estimates are computed correctly. I have tried also to rescale the intercept but it did not help. I would like to know how to reproduce the results of `bayeglm`

with `stan_glm`

and, if it is not possible, why.