Improving Speed for hierarchical logit/probit models

Hi -

I use the following code to estimate a hierarchical logit model.

data {
  int<lower=1> D;   // number of covariates
  int<lower=0> N;   // number of observations
  int<lower=1> L;   // number of groups (levels)
  int<lower=0,upper=1> y[N];  // binary response
  int<lower=1,upper=L> ll[N];    // group ids
  row_vector[D] x[N];   //covariates
parameters {
  real mu[D];
  real<lower=0> sigma[D];
  vector[D] beta[L];
model {
  mu ~ normal(0, 100);  // prior for mu
//  sigma ~ uniform(0, 100) ; // prior for sigma
  for (l in 1:L) beta[l] ~ normal(mu, sigma);
    vector[N] x_beta_ll;
    for (n in 1:N) x_beta_ll[n] = x[n] * beta[ll[n]];
    y ~ bernoulli_logit(x_beta_ll);

For my data, N = 264,713 (number of total observations), L = 10,000 (the number of groups), D = 26 (the number of covariates). Then, I have this message, and it takes an extremely long time to sample:

Chain 1: Gradient evaluation took 0.131 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1310 seconds.
Chain 1: Adjust your expectations accordingly!

I wonder if there is a good way to improve speed. BTW, this is the R code that I used:

fit <- stan(
  file = "hierarchical_logit.stan",  # Stan program
  data = data,    # named list of data
  chains = 1,             # number of Markov chains
  warmup = 1000,          # number of warmup iterations per chain
  iter = 2000,            # total number of iterations per chain
  cores = 5,              # number of cores (could use one per chain)
  verbose = TRUE

I would appreciate a tip from experts.

Thank you.

  • Paul

If you have redundant rows in x within a given group, you can use the sufficient statistics trick to speed things up. Separately, if there is redundancy across groups in the rows of x I think you’d be able to use the reduced-redundant-computations tricks too. Finally, if you have many more cores available than chains you want to sample, take a look at the new reduce_sum feature for within-chain parallelism.

1 Like

I have a somewhat similar model and the fastest way I could find was to use map_rect for each group. This has the benefit that you can use bernoulli_logit_glm, which can be an order of magnitude faster than y ~ bernoulli_logit(theta)

1 Like