How to speed up my Stan code and sampling in rstan?

Isa_Nazar · May 28, 2021, 8:32am

Hi stan community,
I am trying to run a hierarchical survival model in Stan, but I found sampling is very time-consuming (takes more than 10 days even for 25000 iteration/10000 warmup).
I am not good at computing, so would be greatly appreciated if there are any tips!
My stan code is as below (N=34810, n_grid=350, province=31, num=12).

data {
     int <lower=0> province;
     int <lower=0> n_grid;
     int <lower=0> N;
     int <lower=0> num;
     int <lower=0> number;
     vector <lower=0> [province] p;
     matrix [province,n_grid] kernel;
     matrix [N,num] predictor;
     matrix [N,number] respon;
     }
parameters{
     vector [num] beta;
     real <lower=0> sigma0;
     real <lower=0> sigma;
     vector [n_grid] x;
          }
transformed parameters{
      real alpha = 1 / sigma;
     vector [N] lambdaa;
     vector[province] a;
for(n in 1:N) {
     lambdaa [n] = exp ((-1 * predictor [n] * beta) / sigma);
           }
           { // `z` is only visible inside these braces, not part of output
    vector[province] z;
for (k in 1 : province) {
     vector[n_grid] landa_k;
for (j in 1 : n_grid) {
     landa_k[j] = kernel[k, j] * exp(x[j]);
           }
     z[k] = sum(landa_k) * p[k];
           }
     a = z / sum(z); // assign `a`
           }
           }
model{
      matrix [N,province] f;
      matrix [N,province] s;
for (j in 1:n_grid){
     target += normal_lpdf(x[j]|0,1);
  }
for(u in 1:num){
    target += normal_lpdf(beta [u]| 0, sigma0);
          }
    target += inv_gamma_lpdf(sigma0| 1, 1);
    target += normal_lpdf(log (alpha) | 0, 5);
for (i in 1:N) {
for (k in 1:province){
    f[i,k]= a[k]*((lambdaa [i]*alpha*respon[i,1]^(alpha-1))/((1+(lambdaa [i]*respon[i,1]^alpha))^2));
    s[i,k]= a[k]*(1- ((lambdaa [i]*respon[i,1]^alpha)/ (1+(lambdaa [i]*respon[i,1]^alpha))));
    target += (respon[i,2] *  log (f[i,k])) + ((1 - respon[i,2])* log(s[i, k]));
    }
  }
}

Any tips and comments are welcome. Thanks in advance!

Best,
Eisa

moderator edited for stan code formatting

mike-lawrence · May 28, 2021, 11:46am

Why so many? That’s an order of magnitude above the defaults, which themselves are probably overkill for estimating even tail quantities amidst multiple chains.

Isa_Nazar · May 28, 2021, 11:57am

I also checked the execution of the model with a small number of iterations (iter=2000, warmup=500), and again my main problem is the long execution time. It is noteworthy that the model did not converge with low iterations.

Thank you!

mike-lawrence · May 28, 2021, 12:42pm

It is indeed, as this suggests that there’s something wrong with the fundamental structure of the model.

What diagnostics failed? And are you using any other non-default values for the sampling arguments (max_treedepth, adapt_delta, etc)?

Isa_Nazar · May 28, 2021, 1:06pm

Actually, I’m new in stan and didn’t find any diagnostics that failed. Can you help me to tackle this problem to run the model, please?

mike-lawrence · May 28, 2021, 1:15pm

When you ran it with the default for num_warmup and num_samples and determined that it did not converge, on what basis did you decide that it did not converge? rhat? traceplots?

Isa_Nazar · May 28, 2021, 1:36pm

I decided that it did not converge by using a trace plot. Now my main problem in the long run time of the model (about ~ 1 week), and what do you think is the solution to this?

mike-lawrence · May 28, 2021, 6:42pm

It looks like you’re unaware that many (most?) functions in Stan are vectorized so you have a ton of loops that are unnecessary.

You compute exp(x[j]) many times when you should precompute exp_x = exp(x) once and index into that.

You’re going to have to provide a description of the data & model if you want further performance improvement ideas; I’m having trouble discerning what you’re trying to do from just your code.

Isa_Nazar · May 28, 2021, 8:43pm

I really appreciate your help. Actually, I’m new to Stan.
Can you help me to change and speed up my stan codes, please?
In which part of the code can I use a vectorized form instead of a loop? Please let me know the codes, thank you.

Here, x[j]'s (j=1,2, …, n_grid) are white noise and It is used only in the calculation of a and there is no need to estimate it.

Best regards,

Eisa

Topic		Replies	Views
How to speed up my Stan code? Modeling rstan , fitting-issues	8	745	June 14, 2021
How to speed up sampling in STAN when fitting a spatial survival model on a large data set? Developers rstan , techniques , fitting-issues	2	484	June 3, 2021
The sampling has no response, when rstan 2.21 run in R4.0.5 General rstan , fitting-issues	15	1101	September 15, 2021
Sampling function taking too long to run at Chain 4 Modeling	2	694	April 12, 2022
How speeding up my stan code? Modeling	5	3301	September 23, 2018

How to speed up my Stan code and sampling in rstan?

Related topics