Large data sets with stan code

nguyenk02 · September 18, 2018, 8:50pm

Hello Everyone,

I appreciate for your help on the following stan model. I ran into a computational issue when the data set gets larger to 350,000 observations and 40 variables. I wonder any guidance you can provide me on optimizing it.

Thank you very much!

data {
int<lower=1> n;
int<lower=1> p;
int<lower=0> V[n, p];
int<lower=0> I[n]; /
}

parameters{
vector[p] beta;
vector<lower=0>[p] alpha;
vector[n] theta;
}

model{
theta ~ normal(0, 1);
alpha ~ lognormal(1, 1);
beta ~ normal(0, 3);

for(k in 1:p){
V[,k] ~ binomial_logit(I, beta[k] + alpha[k] * theta); //model
}

}

bgoodri · September 18, 2018, 9:07pm

You can reverse the dimensions of V in order to get faster access. In addition, I would declare alpha as a simplex[p] in order to break the scale indeterminacy that arises from multiplying alpha by any positive constant and dividing theta by that same constant. You would have to change the prior on alpha though, presumably to some Dirichlet, and possibly have to change the prior on theta. Even then, it will take a while to run.

Topic		Replies	Views
Optimization using STAN using Simplex Modeling fitting-issues , specification	1	636	April 4, 2019
Increasing Stan efficiency by vectorizing for loop Modeling	6	680	October 9, 2022
Time-series in Stan, I am new to Stan and need hints to develop the model. THANKS Modeling rstan , specification	43	2839	June 12, 2020
Newly Specified Bayesian Hierarchical Model with High Runtime Modeling techniques , specification	17	692	October 12, 2023
Optimized 2D structure for large dimensions Modeling techniques	2	314	April 2, 2020

Large data sets with stan code

Related topics