With that big N and small K, you don’t need MCMC. You can use optimizing and draws from the approximate posterior. Here’s an example adapted from https://avehtari.github.io/RAOS-Examples/BigData/bigdata.html and using github version of rstanarm which provides also Pareto-k and

n_eff diagnostics:

```
SEED <- 1655
set.seed(SEED)
n <- 6e5
k <- 3
x <- rnorm(n)
xn <- matrix(rnorm(n*(k-1)),nrow=n)
a <- 2
b <- 3
sigma <- 1
y <- (a + b*x + sigma*rnorm(n))>0+0
fake <- data.frame(x, xn, y)
fit3 <- stan_glm(y ~ ., data=fake, family=binomial(),
algorithm='optimizing', init=0)
```

In my laptop this takes 22s.

```
> summary(fit3)
Model Info:
function: stan_glm
family: binomial [logit]
formula: y ~ .
algorithm: optimizing
priors: see help('prior_summary')
observations: 600000
predictors: 4
Estimates:
Median MAD_SD 10% 50% 90%
(Intercept) 3.6 0.0 3.6 3.6 3.6
x 5.4 0.0 5.4 5.4 5.4
X1 0.0 0.0 0.0 0.0 0.0
X2 0.0 0.0 0.0 0.0 0.0
Monte Carlo diagnostics
mcse khat n_eff
(Intercept) 0.0 0.3 800
x 0.0 0.3 809
X1 0.0 0.3 777
X2 0.0 0.3 821
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and khat is the Pareto k diagnostic for importance sampling.
(perfomance is usually good when khat < 0.7).
```