How speeding up my stan code?

Hello,
The following paragraph is my stan code.
I used this code for large data (N=45740).
It took very long time.
I really want to know how can I speeding up my code.

Thank you.

data {
int<lower=1> ns; // number of total locations
int<lower=1> nt; // number of data points
int<lower=1> npt; // number of temporal predictors
matrix[nt,ns] y; // observations
matrix [nt,npt] x; //temporal predictor matrix
matrix [ns,ns] dist; //cov matrix
matrix [npt,npt] Wp_b;
real mle_cop_range;
}

transformed data {
vector[nt] y_s[ns];
vector[ns] zero_vector_ns;
real <lower=0> df_b; //degrees of freedom, for inv-wishart dist for temporal.

for(s in 1:ns)
for(t in 1:nt){
y_s[s,t] = y[t,s];
df_b = npt+1;
}

for (i in 1:ns)
zero_vector_ns[i] = 0;
}

parameters {
vector <lower=0,upper=55> [ns] sigma;
// vector <lower=-1,upper=1> [ns] xi;
real <lower=-1,upper=1> xi;
real<lower=0> range_copula;

vector [ns] alphas; //intercepts, eventually vector [i]
matrix [ns,npt] betas; //Coefficients of temporal predictors

//hyperparameters
vector [npt] mu_b;
cov_matrix [npt] sigmas_b;

}

model {
matrix [npt,npt] L_b;
matrix [nt,ns] mu ;
vector [nt] mu_s[ns];

vector[nt] lpdf_s[ns];
matrix[ns,nt] lpdf;
vector[nt] cdf_s[ns];
matrix[ns,nt] cdf;
matrix[ns,ns] L_copula;
real eclp;

sigmas_b ~ inv_wishart(df_b,Wp_b);
L_b = cholesky_decompose(sigmas_b);
mu_b ~ normal(0,10);

// Priors
range_copula ~ normal(0, 100);
xi~ normal(0,0.3);
sigma ~ normal(0,100);
alphas ~ normal(0,100);

for (i in 1:ns){
betas[i] ~ multi_normal_cholesky(mu_b,L_b);
for (t in 1:nt){
mu[t,i] = alphas[i] +dot_product(betas[i],x[t]);
mu_s[i,t] = mu[t,i];
}
}

for(i in 1:ns){
lpdf_s[i] = gev_v_log_v(y_s[i], mu_s[i], sigma[i], xi);
cdf_s[i] = gev_v_cdf_v(y_s[i], mu_s[i], sigma[i], xi);
for(t in 1:nt){
lpdf[i,t] = lpdf_s[i,t];
cdf[i,t] = cdf_s[i,t];
}
}
L_copula = cholesky_decompose(dependogram_exp(dist, range_copula));
eclp = eliptical_copula_log_v(lpdf, cdf, L_copula, zero_vector_ns);
target+=(eclp);

}

You seem to have a lot of for loops. You can try vectorizing the assigns, but those probably won’t do much since for loops in Stan are pretty fast because they’re compiled to C++. What might make a bigger difference is if you vectorize the sampling statements that are in for loops such as betas[i] ~ multi_normal_cholesky(mu_b,L_b);

That multi_normal PDF involves calculating the determinant of the covariance matrix, so you’re doing that for every iteration of that for loop, even though the determinant looks to be the same in every iteration. If you vectorized, Stan’s autodiff is smart enough to pick up on the fact that the covariance matrix is the same for every iteration and it’ll only compute the determinant once. Depending on how big the covariance matrix is, that can give you a big speedup.

See chapter 28 of the Stan manual for more on vectorization and optimizing Stan code.

Another reason for slow sampling is large tree depths. Large tree depths often happen when the NUTS chains have to go far with a small stepsize. The most common cause of that is highly-correlated posteriors. So I would check to see what your tree-depth values are, and if they’re high, try looking at pairs plots of your posterior samples to identify parameters that are highly-correlated in the posterior.

1 Like

Thank you so much for your information and help.
Well, I try to modified and added what you proposed (see bellow the stan model) but the running of this model still very slow.

data {
int<lower=1> ns; // number of total locations
int<lower=1> nt; // number of data points
int<lower=1> npt; // number of temporal predictors
matrix[nt,ns] y; // observations
matrix [nt,npt] x; //temporal predictor matrix
matrix [ns,ns] dist; //cov matrix
matrix [npt,npt] Wp_b;
real mle_cop_range;
}

transformed data {
vector[nt] y_s[ns];
vector[ns] zero_vector_ns;
real <lower=0> df_b; //degrees of freedom, for inv-wishart dist for temporal.

for(s in 1:ns)
for(t in 1:nt){
y_s[s,t] = y[t,s];

  df_b = npt+1;
}

for (i in 1:ns)
zero_vector_ns[i] = 0;
}

parameters {
vector <lower=0,upper=55> [ns] sigma;
real <lower=-1,upper=1> xi;
real<lower=0> range_copula;

vector [ns] alphas; //intercepts, eventually vector [i]
vector[npt] betas [ns]; //Coefficients of temporal predictors

//hyperparameters
vector [npt] mu_b;
cov_matrix [npt] sigmas_b;

}

model {
matrix [npt,npt] L_b;
matrix [nt,ns] mu ;
vector [nt] mu_s[ns];

vector[nt] lpdf_s[ns];
matrix[ns,nt] lpdf;
vector[nt] cdf_s[ns];
matrix[ns,nt] cdf;
matrix[ns,ns] L_copula;
real eclp;

sigmas_b ~ inv_wishart(df_b,Wp_b);
L_b = cholesky_decompose(sigmas_b);
mu_b ~ normal(0,10);

// Priors
range_copula ~ normal(0, 100);
xi~ normal(0,0.3);
sigma ~ normal(0,100);
alphas ~ normal(0,100);

betas ~ multi_normal_cholesky(mu_b,L_b);

for (i in 1:ns){
// betas[i] ~ multi_normal_cholesky(mu_b,L_b);
for (t in 1:nt){
mu[t,i] = alphas[i] +dot_product(betas[i],x[t]);
mu_s[i,t] = mu[t,i];
}

lpdf_s[i] = gev_v_log_v(y_s[i], mu_s[i], sigma[i], xi);
cdf_s[i] = gev_v_cdf_v(y_s[i], mu_s[i], sigma[i], xi);

// transpose the data so that time is the first index
for(t in 1:nt){
  lpdf[i,t] = lpdf_s[i,t];
  cdf[i,t] = cdf_s[i,t];
}

}

// fit copula to complete data
L_copula = cholesky_decompose(dependogram_exp(dist, range_copula));
eclp = eliptical_copula_log_v(lpdf, cdf, L_copula, zero_vector_ns);
target+=(eclp);
}

Hmm. Did that give you a speedup at all in the gradient evaluation? When Stan first starts, it tells you how long it took to evaluate the gradient. Did that change significantly?

Also, you can edit your post to highlight Stan syntax to make it easier to read on the forums. That’ll make it easier to take a look. At the beginning of your code type three tickmarks and then Stan like so: ```stan then at the end of your code put three more tickmarks.

1 Like

still have the same thing and I have this message:Gradient evaluation took 9.85938 seconds
1000 transitions using 10 leapfrog steps per transition would take 98593.8 seconds.
Adjust your expectations accordingly!

How long did it take to do gradient evaluation before? Also can you describe your model and data a bit more?