Optimizing generalized linear mixed effects model with a logit link

Hi all,

I having a binomial generalized linear mixed effects model that I have written in STAN, but it is incredibly slow compared to RSTANARM. I would like to write everything in STAN proper, but need some help optimizing the model. I think that the loops are killing me, but not sure how to correct. Here is reproducible code. First, the data and the RSTANARM code.

library(lme4)
library(rstan)
library(rstanarm)

hdp <- read.csv("https://stats.idre.ucla.edu/stat/data/hdp.csv")
hdp <- within(hdp, {
  Married <- factor(Married, levels = 0:1, labels = c("no", "yes"))
  DID <- factor(DID)
  HID <- factor(HID)
})

m <- glmer(remission ~ IL6 + CRP + CancerStage + LengthofStay + Experience +
             (1 | DID), data = hdp, family = binomial, control = glmerControl(optimizer = "bobyqa"),
           nAGQ = 10)

# print the mod results without correlations among fixed effects
print(m, corr = FALSE)


m_stan= stan_glmer(remission ~ IL6 + CRP + CancerStage + LengthofStay + Experience +
                              (1 | DID),family=binomial,data=hdp)

Now, here is the STAN code that I have written:

binomCode = 'data{
	int N; // number of data points
	int remission[N]; // result
	int NF; // rows in matrix
	int DID[N]; // group effect
	int N_DID; // number of groups
	matrix [N,NF] Xmat; // design matrix
}

parameters{
	vector[NF] beta; //vector of betas
	real vary_DID_index[N_DID];
	real sigma_DID;
}

model{
	vector[N] vary;
	vector[N] lm;

	// Priors
	beta ~ normal( 0 , 100 );
	sigma_DID ~ uniform( 0 , 100 );

	// Varying effects
	for ( j in 1:N_DID ) vary_DID_index[j] ~ normal( 0 , sigma_DID );

	// Fixed effects
	for ( i in 1:N ) {
	  vary[i] = vary_DID_index[DID[i]];
	  lm[i] = vary[i] + Xmat[i]*beta;
	  remission[i] ~ bernoulli_logit(lm[i]);
	}

}'

# now without the crutch
Xmat = model.matrix(m)

## MCMC settings
ni <- 5000
nt <- 1
nb <- 2000
nc <- 3

remission = as.numeric(hdp$remission)
N = nrow(Xmat)
NF = ncol(Xmat)
DID = as.numeric(hdp$DID)
N_DID = length(levels(hdp$DID))

data.list = list(N = N, remission = remission, NF = NF, DID = DID, N_DID = N_DID,Xmat = Xmat)

m2_stan = stan("binomCode",data=data.list,chains = nc,thin = nt,iter = ni, warmup = nb,
                       seed = 1)
print(m)
print(m_stan)
print(m2_stan, pars=c("beta[1]","beta[2]","beta[3]","beta[4]","beta[5]","beta[6]","beta[7]","beta[8]"))

I get the same results, but my code is way too slow. Any help would be appreciated.

Hi Wade,

four things come to mind to speed up your model.

  1. Get rid of loops
  2. Use non-centered parameterization of the varying intercept
  3. Use more reasonable priors
  4. Use QR decomposition of the mean-centered predictor matrix

Something like this:

data{
  int N; // number of data points
  int remission[N]; // result
  int NF; // rows in matrix
  int DID[N]; // group effect
  int N_DID; // number of groups
  matrix [N,NF] Xmat; // design matrix
}
transformed data {
  matrix[N, NF] X_centered;
  matrix[N, NF] Q;
  matrix[NF, NF] R;
  matrix[NF, NF] R_inv;

  for (nf in 1:NF)
    X_centered[, nf] = Xmat[, nf] - mean(Xmat[, nf]);

  // Compute, thin, and then scale QR decomposition
  Q = qr_thin_Q(X_centered) * sqrt(N - 1);
  R = qr_thin_R(X_centered) / sqrt(N - 1);
  R_inv = inverse(R);
}
parameters{
  vector[NF] beta_tilde; //vector of betas in Q-space
  real alpha_0;
  vector[N_DID] alpha_tilde;
  real<lower=0> sigma_DID;
}
transformed parameters{
  vector[N_DID] vary_DID_index = sigma_DID*alpha_tilde; // varying intercept - non-centered
}
model{
  // Priors
  beta_tilde ~ normal(0, 2.5); // weakly-informative prior
  alpha_0 ~ normal(-0.8, 2.5); // informative prior on the intercept: roughly sample mean of remission on logit scale
  alpha_tilde ~ std_normal(); // standard normal for non-centered group
  sigma_DID ~ exponential(1); // more reasonable prior for group sd
  
  // Likelihood
  remission ~ bernoulli_logit(alpha_0 + vary_DID_index[DID] + Q*beta_tilde);
  
}
generated quantities{
  vector[NF] beta = R_inv*beta_tilde; // transform back to "normal" betas
}

Edit: Running the model on my old laptop using your options took 715, 733, and 1031 seconds for each chain respectively. These are the results for beta:

3 chains, each with iter=5000; warmup=2000; thin=1; 
post-warmup draws per chain=3000, total post-warmup draws=9000.

         mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
beta[1]  0.04    0.00 0.01  0.02  0.03  0.04  0.04  0.06   655 1.01
beta[2] -0.48    0.00 0.09 -0.66 -0.54 -0.48 -0.41 -0.29  8174 1.00
beta[3] -0.25    0.00 0.10 -0.45 -0.32 -0.25 -0.19 -0.06  8667 1.00
beta[4]  0.14    0.00 0.02  0.11  0.13  0.14  0.15  0.18  5245 1.00
beta[5] -0.02    0.00 0.02 -0.05 -0.03 -0.02 -0.01  0.01  6189 1.00
beta[6] -0.24    0.00 0.02 -0.27 -0.25 -0.24 -0.23 -0.21  7835 1.00
beta[7] -0.07    0.00 0.03 -0.13 -0.09 -0.07 -0.05 -0.01  5023 1.00
beta[8]  1.98    0.02 0.47  1.05  1.66  1.98  2.31  2.89   648 1.01

Edit 2: I corrected the generated quantities block to R_inv*beta_tilde (see below). The displayed summaries for beta are wrong.

The suggestions allowed the code to run in a quarter of the time it took before. I appreciate the helpful hints.

However, the recovered parameters don’t match what I get from stan_glmer or the code I had originally. Do I need to rescale the parameters?

You are right, there is a mistake in the code that I posted. It should be vector[NF] beta = R_inv*beta_tilde; in the generated quantities block. Sorry for that!