Speed up multilevel logit: large dataset and group-level variables

Just updating this thread for anyone who’s interested. What @bbbales2 mentioned at the margins, as expected.

The big gain came from a suggestion by @mike-lawrence, who pointed me to the exploiting sufficient statistics section of the manual. Assuming that observations are i.i.d. at the country-year level, I summed the number of ones and calculated the number of trials for each country year. This also meant I had to average the individual level covariates at the same level. Revised model, which takes about 40 minutes, is below.


// data
data {
  int<lower = 0> N; // number of obs = S with sum
  int<lower = 0> n_res[N]; // number of respondents in group
  int y[N]; // outcome: sum by group
  
  int<lower = 0> S; // number of countries
  int<lower = 1, upper = S> state[N]; // state indicator

  int<lower = 0> T; // number of year obs
  int<lower = 1, upper = T> year[N];
  
  int<lower = 1> I; // number of individual variables
  matrix[N, I] X; // individual-level reg matrix
  int<lower = 1> J; // number of state-year variables
  matrix[N, J] Z; // state-year reg matrix; summed 
  int<lower = 1> L; // number of system-year variables
  matrix[T, L] G; // year/system reg matrix
}

// parameters.
parameters {
  real alpha; // overall intercept
//  real<lower = 0> sigma; // outcome variance
  
 
  vector[S] alpha_state_std; // state intercepts- noncenter
  real<lower = 0> sigma_state; // state var. hyperparam
  real mu_state;
  
  vector[T] alpha_year_std; // state intercepts- noncenter
  real<lower = 0> sigma_year; // state var. hyperparam

  vector[I] beta; // individual coefs
  vector[J] gamma; // state-year coefs
  vector[L] lambda; // alliance-year coefs

}

transformed parameters {
   vector[T] mu_year; 
   vector[T] alpha_year; // year intercepts
   vector[S] alpha_state; // state intercepts

   
   // regression models of mean year varying intercepts
   mu_year = G * lambda; // year/system level 

  // non-centered parameterization of state intercepts
  alpha_state = mu_state + sigma_state * alpha_state_std;

  // non-centered parameterization of year intercepts
  alpha_year = mu_year + sigma_year * alpha_year_std;
  

}

// model 
model {
  // define grain size (automatic selection)
  int grainsize = 1; 
  
  // define priors
  alpha ~ std_normal(); 
  
  // state parameters
  sigma_state ~ normal(0, 1); // half-normal
  alpha_state_std ~ std_normal(); // state intercepts non-centered
  // year paramters
  alpha_year_std ~ std_normal(); // year intercepts non-centered
  sigma_year ~ normal(0, 1); // half-normal
  
  // regression coef priors 
  // robust priors 
  beta ~ student_t(5, 0, 1); 
  gamma ~ student_t(5, 0, 1); 
  lambda ~ student_t(5, 0, 1);

  y ~ binomial_logit(n_res, alpha +
      alpha_state[state] + alpha_year[year] +
       X * beta + Z * gamma);

}

Another option that I considered but didn’t need is the reduced redundant computation trick. Just putting this here in case someone else finds it helpful.

2 Likes