I want to model the interaction between x1 and x2 in this simulated data with a logistic regression
I am quite inexperienced in stan, but to my knowledge this is a very simple stan model which should take 52 hours to sample from according to stan. This is a simple logistic regression with 3 predictors which should be fairly easy to sample. I am using rstan and my setup in R looks like this:
X <- data.frame(x1 = rbinom(6534, 2, 0.3),
x2 = rbinom(6534, 2, 0.3))
Y <- rbinom(6534, 1, 0.2)
logistic_regression <- stan_model("logistic_regression.stan")
logistic_fit <- sampling(logistic_regression,
list(N = dim(X)[1],
y=Y,
x=X),
iter=2000,
chains = 4,
save_warmup=FALSE)
My stan model looks like this
logistic_regression.stan
data{
int<lower=1> N; // Rows
int<lower=0, upper=1> y[N]; // Outcome variables
matrix<lower=0, upper=2>[N, 2] x; // Predictor variables, always two columns
}
parameters{
real a;
row_vector[3] b;
}
model{
// Priors
a ~ normal(0, 0.1);
for (i in 1:3)
{
b[i] ~ normal(0, 0.2^2);
}
// Likelihood
for (i in 1:N)
{
y[i] ~ bernoulli_logit( a + b[1]*x[,1] + b[2]*x[,1] + b[3]*x[,1].*x[,2] );
}
}
When I initialize sampling I get this message
Chain 1: Gradient evaluation took 21.6585 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 216585 seconds.
Chain 1: Adjust your expectations accordingly!
And the same message for chain 2, 3 & 4. I left it to run overnight and it wasn’t done the next morning. I have tried simple stan models in the past without this run time, but I don’t have the code anymore so I cannot compare to older models. I am working on a linux server on which many other people work without a problem.
Why does the sampling take this long time in this model?