Large data sets with stan code

Hello Everyone,

I appreciate for your help on the following stan model. I ran into a computational issue when the data set gets larger to 350,000 observations and 40 variables. I wonder any guidance you can provide me on optimizing it.

Thank you very much!


data {
int<lower=1> n;
int<lower=1> p;
int<lower=0> V[n, p];
int<lower=0> I[n]; /
}

parameters{
vector[p] beta;
vector<lower=0>[p] alpha;
vector[n] theta;
}

model{
theta ~ normal(0, 1);
alpha ~ lognormal(1, 1);
beta ~ normal(0, 3);

for(k in 1:p){
V[,k] ~ binomial_logit(I, beta[k] + alpha[k] * theta); //model
}

}

You can reverse the dimensions of V in order to get faster access. In addition, I would declare alpha as a simplex[p] in order to break the scale indeterminacy that arises from multiplying alpha by any positive constant and dividing theta by that same constant. You would have to change the prior on alpha though, presumably to some Dirichlet, and possibly have to change the prior on theta. Even then, it will take a while to run.