How to speed up sampling in STAN when fitting a spatial survival model on a large data set?

Isa_Nazar · June 3, 2021, 11:29am

Dear all,
I’m trying to fit a spatial survival model using RSTAN for large data (N=35000). But, I found sampling is very time-consuming (takes ~ 10 days even for 2000 iteration/500 warmup).
Could anyone help me to solve this problem, please?
Any tips and comments are welcome. Thanks in advance!

My stan code is as below (N=35000, n_grid=350, province=31, num=12).
In my codes, x[j]'s (j=1,2, …, n_grid) are white noise and It is used only in the calculation of a and there is no need to estimate it.

fit < - stan(“final model.stan”,data = da,iter=iter, warmup=warmup, cores =4,chains = 2, thin = 1, init = “random”,control=list(max_treedepth=15))

data {
  int <lower=0> province;
  int <lower=0> n_grid;
  int <lower=0> N;
  int <lower=0> num;
  int <lower=0> number;
  vector <lower=0> [province] p;
  row_vector[n_grid] kernel[province];
  matrix [N,number] respon;
}
parameters{
  vector [num] beta;
  real <lower=0> sigma0;
  real <lower=0> sigma;
  vector [n_grid] x;
}
transformed parameters{
  real alpha = 1 / sigma;
  vector [N] lambdaa;
  vector [n_grid] exp_x;
  vector[province] a; // `a` saved to output 
    exp_x = exp(x);
     lambdaa = exp ((-1 * respon[ ,6:17] * beta) / sigma);
           { // `z`  not part of output
    vector[province] z;
      vector[province] landa;
for (k in 1 : province) {
     landa[k] = kernel[k] * exp_x * p[k];
           }
     a = landa / sum(landa); // assign `a`
           }
           }
model{
     target += normal_lpdf(x|0,5);
    target += normal_lpdf(beta | 0, sigma0);
    target += inv_gamma_lpdf(sigma0| 0.001, 0.001);
    target += normal_lpdf(log (alpha) | 0, 5);
for (i in 1:N) {
   vector [province] f_i;
  vector [province] s_i;
for (k in 1:province){
    f_i[k]= a[k]*((lambdaa [i]*alpha*respon[i,4]^(alpha-1))/((1+(lambdaa [i]*respon[i,4]^alpha))^2));
    s_i[k]= a[k]*(1- ((lambdaa [i]*respon[i,4]^alpha)/ (1+(lambdaa [i]*respon[i,4]^alpha))));
    }
    target += (respon[i,5] *  log (f_i)) + ((1 - respon[i,5])* log(s_i));
  }
}

EDIT: @maxbiostat edited this post for syntax highlighting.

andrjohns · June 3, 2021, 2:13pm

Just a quick clarification about your model first.

You’ve specified the parameter sigma with an inv_gamma(0.001, 0.001) prior, which is highly diffuse (which we don’t really recommend with Stan, as an aside):

target += inv_gamma_lpdf(sigma0| 0.001, 0.001);

Then you’ve specified the transformed parameter alpha as the inverse of sigma:

  real alpha = 1 / sigma;

Then you’ve assigned a prior to the log of alpha:

  target += normal_lpdf(log(alpha) | 0, 5);

If you assign a prior to a transformation of a parameter, then you need to add a jacobian correction to the log-likelihood (more information in the manual). But it may be simpler to just encode this expected range in the prior on sigma, rather than placing a prior on two transformations (log(1/sigma)).

Is there a reason you needed this extra prior? Can you try the model without it?

Isa_Nazar · June 3, 2021, 5:45pm

Thank you so much for your information and help.
I want to fit a spatial log-logistic survival model to the data and wrote the likelihood function in stan as user-defined (time and censor indicator variables in the code are respon[i,4] and respon[i,5], respectively).
sigma0 is a hyperparameter and alpha = 1 / sigma is a shape parameter of the log-logistic model.
Given the large dataset and the complexity of our model, what do you suggest about the priors as well as the code modification?

Thanks for the advice!

Topic		Replies	Views
How to speed up my Stan code and sampling in rstan? Modeling rstan , fitting-issues , performance	8	1171	May 28, 2021
Issues with defining the custom likelihoods of spatial survival model in Stan Modeling fitting-issues	4	379	June 10, 2021
How to speed up my Stan code? Modeling rstan , fitting-issues	8	745	June 14, 2021
Confused about why the logitudinal tractory model codes moves extremely slow Modeling rstan , techniques , fitting-issues	0	223	May 23, 2023
Issues with running spatial survival model in Stan Modeling rstan	5	431	May 24, 2021

How to speed up sampling in STAN when fitting a spatial survival model on a large data set?

Related topics