I recently had a hierarchical logistic regression problem that I solved with rstanarm’s stan_glmer. In simplified form, the problem contains a random intercept, a random slope for a numerical variable and random slopes for a categorical variable.
The data can be found here: data.csv (71.0 KB)
The code used to get the stan_glmer model is the following:
library(rstan)
library(rstanarm)
data < read.csv("data.csv", colClasses = c("integer", "numeric", "factor", "factor"))
fit_arm < stan_glmer(y ~ x1 + x2 + (1 + x1 + x2  grp),
data = data, family = binomial("logit"), seed = 42)
summary(fit_arm, probs = c(0.025, 0.5, 0.975), digits = 2)
# Estimates:
# mean sd 2.5% 50% 97.5%
# (Intercept) 0.502 0.344 1.203 0.500 0.175
# x1 0.000 0.104 0.194 0.002 0.224
# x2b 0.031 0.326 0.619 0.032 0.672
# x2c 0.653 0.226 1.086 0.657 0.181
# ...
For learning purposes, I would now like to code the model directly in stan. In order to do so, I mostly followed https://mcstan.org/docs/2_18/stanusersguide/hierarchicallogisticregression.html and https://mcstan.org/docs/2_18/stanusersguide/multivariatehierarchicalpriorssection.html. This brought me to the following stan model:
data {
int<lower=0> N; // num individuals
int<lower=1> K; // num ind predictors
int<lower=1> J; // num groups
int<lower=1,upper=J> jj[N]; // group for individual
matrix[N, K] X; // individual predictors
int y[N];
// outcomes
}
parameters {
vector[K] beta; // means
// Modelling the covariance structure of random effects
matrix[K, J] z;
cholesky_factor_corr[K] L_Omega;
vector<lower=0,upper=pi()/2>[K] tau_unif;
}
transformed parameters {
matrix[J, K] u; // random effects
vector<lower=0>[K] tau; // prior scale
for (k in 1:K) tau[k] = 2.5 * tan(tau_unif[k]);
u = (diag_pre_multiply(tau,L_Omega) * z)';
}
model {
beta ~ normal(0, 2.5);
to_vector(z) ~ std_normal();
L_Omega ~ lkj_corr_cholesky(2);
y ~ bernoulli_logit(rows_dot_product(rep_matrix(beta, N)' + u[jj], X));
}
generated quantities{
matrix[K, K] Sigma = tcrossprod(diag_pre_multiply(tau,L_Omega));
}
which can be run in R with:
X < model.matrix(~ x1 + x2, data = data)
stan_data < list(
N = nrow(X),
K = ncol(X),
J = length(levels(data$grp)),
jj = as.integer(data$grp),
X = X,
y = data$y
)
fit_cust < stan(model_code = stan_model,
data = stan_data,
chains = 4, iter = 2000,
seed = 42,
control = list(adapt_delta = 0.95))
# mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
# beta[1] 0.469 0.017 0.617 1.684 0.480 0.748 1363.445 1.000
# beta[2] 0.010 0.002 0.090 0.201 0.007 0.160 1301.269 1.001
# beta[3] 0.023 0.014 0.657 1.375 0.037 1.369 2157.877 1.001
# beta[4] 0.644 0.007 0.318 1.217 0.658 0.044 1919.180 1.001
# ...
Questions:

The first thing that strikes me is that although the means are very similar in the two models, the raw stan model has much wider standard errors. Does someone know why this might be the case?

The second question relates to runtime. The custom stan model has longer fitting time. I have tried some tuning, such as using vectorisation and using the a cholesky correlation matrix. I have also centered all variables before fitting. Are there any other improvements that you would recommend, e.g. a Cholesky factorisation of the input matrix X?
I have tried to answer the above questions by diving into stan_glmer’s source code. However, as it is accounting for so many different flavours of GLMMs, I very quickly got lost. Any advise that could help me track down the parts I am missing in my code would be greatly appreciated!