Survival model in Rstan

Assem · May 3, 2018, 6:00pm

Hi everyone, I am trying to implement survival model in rstan. The code below is bit modified version of the leuk example. The problem is when I run the program, it takes more than 8 hours with 4 chains each 1000 iterations. The data is relatively big, 4000 observations and 7 predictors. Is there any possible way to optimize the code and make it faster?!

data {
  int<lower=0> N;//number of observations
  int<lower=0> NT;//number of death times
  int<lower=0> obs_t[N]; // survival months of each patient
  int<lower=0> S[NT+1]; //unique death times
  int<lower=0> fail[N]; // dead or alive
  int K;
  matrix[N,K] z;
  real mu[N];// Lnr
}
transformed data {
  int Y[N, NT];
  int dN[N, NT]; 
  real c;
  real r; 
  for(i in 1:N) {
    for(j in 1:NT) {
      Y[i, j] = int_step(obs_t[i] - S[j] + .000000001);
     dN[i, j] = Y[i, j] * fail[i] * int_step(S[j + 1] - obs_t[i] - .000000001);
    }
  }
  c = 1.5; 
  r = 0.1; 
}
parameters {
  real betaLnr; 
  vector [K] beta;
  real<lower=0> dL0[NT]; 
} 

model {
  betaLnr ~ normal(0, 1000);
  beta ~ normal(0, 1000);
  for(j in 1:NT) {
    dL0[j] ~ gamma(r * (S[j+1] - S[j]) * c, c);
    for(i in 1:N) {
      if (Y[i, j] != 0)  
       dN[i,j]~poisson(Y[i, j] * exp(mu[i]*betaLnr+dot_product(z[i],beta)))); 
    }
   }
}

ermeel · May 4, 2018, 5:23am

I give it s shot. Some thoughts follow:

Performance also depends on how large NT is.

You could precualculate in the transformed data a difference vector for S and then vectorize this statement so it works without a loop.

Equally well you could precalculate the expression that is the argument of the exp as a local variable in the model block using vectorized operations:

vector[N] loc_exp = exp(mu*betaLnr+z*beta);

Also I think mu should better be a vector.

About other optimizations I have to think more…

Assem · May 4, 2018, 5:44am

Ok thank you, I’m gonna try your suggestions.
Looking forward for updates

ermeel · May 12, 2018, 4:49am

See also https://github.com/stan-dev/example-models/blob/master/bugs_examples/vol1/leuk/leuk.stan

Assem · May 12, 2018, 9:49am

Thank you very much for keepng me updated.

Yes, I have seen this example before!

I reduced the time by adding “cores” in stan() function. Before I did not know about it,
unless I have seen in your code. cores=6 helped me to run faster, it took around 1,5 hours.

Bob_Carpenter · June 15, 2018, 10:34pm

Don’t worry about efficiency in transformed data. But the rest of the model is tricky to make more efficient.

You can precompute a vector of S_diff_times_c[j] = (S[j + 1] - S[j]) * c to use in the dL0 distribution in the model block. Then vectorize to

dL0 ~ gamma(r * S_diff_times_c[j], c);

Then you need to vectorize the dN, but there’s the nasty Y[i, j] > 0 condition,w hich seems to imply the data with Y[i, j] == 0 is not being modeled. Is that the intention?

To vectorize, you need replace Y with a precomputed array of values where Y[i,j] > 0, but that will turn out to be ragged, which is a headache with our current rectangular data structures. Then vectorize by row. And use the poisson_log distribution, which takes a parameter on the log scale, e.g.,

dN[i] ~ poisson_log(log_Y[i,j] + ...);

russt · December 5, 2018, 5:08pm

I’ve tried to vectorize the time difference S_diff_times_c[j] in the Stan dev leukemia survival model and it returned divergent transitions:

transformed data {
  vector[NT] S_diff_times_c;
...
  for(j in 1:NT) {
  S_diff_times_c[j] = (t[j + 1] - t[j]);
}
}

Edit: the error comes from the fact I should be using dL0 ~ gamma(r * S_diff_times_c , c) without the index in the model block for it to work.

Topic		Replies	Views
How to speed up my Stan code? Modeling rstan , fitting-issues	8	745	June 14, 2021
How to speed up my Stan code and sampling in rstan? Modeling rstan , fitting-issues , performance	8	1172	May 28, 2021
Is there a way to improve the coding efficiency? Modeling rstan , techniques	3	49	September 25, 2024
How to speed up model? Modeling rstan , techniques , performance	7	814	September 9, 2020
Rstan much slower than self-coded MCMC Modeling performance	10	2399	January 14, 2022

Survival model in Rstan

Related topics