Hi -
I use the following code to estimate a hierarchical logit model.
data {
int<lower=1> D; // number of covariates
int<lower=0> N; // number of observations
int<lower=1> L; // number of groups (levels)
int<lower=0,upper=1> y[N]; // binary response
int<lower=1,upper=L> ll[N]; // group ids
row_vector[D] x[N]; //covariates
}
parameters {
real mu[D];
real<lower=0> sigma[D];
vector[D] beta[L];
}
model {
mu ~ normal(0, 100); // prior for mu
// sigma ~ uniform(0, 100) ; // prior for sigma
for (l in 1:L) beta[l] ~ normal(mu, sigma);
{
vector[N] x_beta_ll;
for (n in 1:N) x_beta_ll[n] = x[n] * beta[ll[n]];
y ~ bernoulli_logit(x_beta_ll);
}
}
For my data, N = 264,713 (number of total observations), L = 10,000 (the number of groups), D = 26 (the number of covariates). Then, I have this message, and it takes an extremely long time to sample:
Chain 1: Gradient evaluation took 0.131 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1310 seconds.
Chain 1: Adjust your expectations accordingly!
I wonder if there is a good way to improve speed. BTW, this is the R code that I used:
fit <- stan(
file = "hierarchical_logit.stan", # Stan program
data = data, # named list of data
chains = 1, # number of Markov chains
warmup = 1000, # number of warmup iterations per chain
iter = 2000, # total number of iterations per chain
cores = 5, # number of cores (could use one per chain)
verbose = TRUE
)
I would appreciate a tip from experts.
Thank you.
- Paul