Custom hierarchical logistic reg has larger std errors than stan_glmer results

I recently had a hierarchical logistic regression problem that I solved with rstanarm’s stan_glmer. In simplified form, the problem contains a random intercept, a random slope for a numerical variable and random slopes for a categorical variable.

The data can be found here: data.csv (71.0 KB)

The code used to get the stan_glmer model is the following:

library(rstan)
library(rstanarm)

data <- read.csv("data.csv", colClasses = c("integer", "numeric", "factor", "factor"))

fit_arm <- stan_glmer(y ~ x1 + x2 + (1 + x1 + x2 | grp), 
                                   data = data, family = binomial("logit"), seed = 42)

summary(fit_arm, probs = c(0.025, 0.5, 0.975), digits = 2)

# Estimates:
#                                      mean   sd     2.5%   50%    97.5%
# (Intercept)                        -0.502  0.344 -1.203 -0.500  0.175 
# x1                                  0.000  0.104 -0.194 -0.002  0.224 
# x2b                                 0.031  0.326 -0.619  0.032  0.672 
# x2c                                -0.653  0.226 -1.086 -0.657 -0.181 
# ...

For learning purposes, I would now like to code the model directly in stan. In order to do so, I mostly followed https://mc-stan.org/docs/2_18/stan-users-guide/hierarchical-logistic-regression.html and https://mc-stan.org/docs/2_18/stan-users-guide/multivariate-hierarchical-priors-section.html. This brought me to the following stan model:

data {
  int<lower=0> N;              // num individuals
  int<lower=1> K;              // num ind predictors
  int<lower=1> J;              // num groups
  int<lower=1,upper=J> jj[N];  // group for individual
  matrix[N, K] X;              // individual predictors
  int y[N];  
  // outcomes
}
parameters {
  vector[K] beta;         // means
  
  // Modelling the covariance structure of random effects
  matrix[K, J] z;
  cholesky_factor_corr[K] L_Omega;
  vector<lower=0,upper=pi()/2>[K] tau_unif;
}
transformed parameters {
  matrix[J, K] u;         // random effects
  vector<lower=0>[K] tau; // prior scale
  
  for (k in 1:K) tau[k] = 2.5 * tan(tau_unif[k]);  
  u = (diag_pre_multiply(tau,L_Omega) * z)';       
}
model {
  beta ~ normal(0, 2.5);
  
  to_vector(z) ~ std_normal();
  L_Omega ~ lkj_corr_cholesky(2);
  
  y ~ bernoulli_logit(rows_dot_product(rep_matrix(beta, N)' + u[jj], X));
}
generated quantities{
  matrix[K, K] Sigma = tcrossprod(diag_pre_multiply(tau,L_Omega));
}

which can be run in R with:

X <- model.matrix(~ x1 + x2, data = data)

stan_data <- list(
  N = nrow(X),
  K = ncol(X),
  J = length(levels(data$grp)),
  jj = as.integer(data$grp),
  X = X,
  y = data$y
)

fit_cust <- stan(model_code = stan_model,
                 data = stan_data, 
                 chains = 4, iter = 2000,
                 seed = 42, 
                 control = list(adapt_delta = 0.95))

#            mean se_mean    sd   2.5%    50%  97.5%    n_eff  Rhat
# beta[1]  -0.469   0.017 0.617 -1.684 -0.480  0.748 1363.445 1.000
# beta[2]  -0.010   0.002 0.090 -0.201 -0.007  0.160 1301.269 1.001
# beta[3]   0.023   0.014 0.657 -1.375  0.037  1.369 2157.877 1.001
# beta[4]  -0.644   0.007 0.318 -1.217 -0.658  0.044 1919.180 1.001
# ...

Questions:

  1. The first thing that strikes me is that although the means are very similar in the two models, the raw stan model has much wider standard errors. Does someone know why this might be the case?

  2. The second question relates to runtime. The custom stan model has longer fitting time. I have tried some tuning, such as using vectorisation and using the a cholesky correlation matrix. I have also centered all variables before fitting. Are there any other improvements that you would recommend, e.g. a Cholesky factorisation of the input matrix X?

I have tried to answer the above questions by diving into stan_glmer’s source code. However, as it is accounting for so many different flavours of GLMMs, I very quickly got lost. Any advise that could help me track down the parts I am missing in my code would be greatly appreciated!

1 Like

I assume the rstanarm procedure can be better optimized with different parameterization/priors. This also affects the runtime.

I have tried something similar with my model recently, the outcome was analogous to yours. I was also lost in the source code. So, I decided to use my own code, because it was supposed to be a part of a larger model and I could at least defend it.

I agree that it is likely/certain that rstanarm uses better parameterisation and priors. If I have a bit of time at hand, I might give it another go and deep dive into the source code.

What I am a bit more worried about are the striking differences in credible intervals. Any ideas why those might occur? Also parameterisation and priors? Or am I missing a crucial bit in my code?

Priors are IMHO a prime suspect. I would advise to make all priors explicit in the call to stan_glmer and make sure you use all the same priors in the Stan code. The way you computed tau is also unfamiliar to me, so that’s another possible source of differences.

Hope that helps!