# How to speed up sampling in STAN when fitting a spatial survival model on a large data set?

Dear all,
I’m trying to fit a spatial survival model using RSTAN for large data (N=35000). But, I found sampling is very time-consuming (takes ~ 10 days even for 2000 iteration/500 warmup).
Could anyone help me to solve this problem, please?

My stan code is as below (N=35000, n_grid=350, province=31, num=12).
In my codes, x[j]'s (j=1,2, …, n_grid) are white noise and It is used only in the calculation of a and there is no need to estimate it.

``````fit < - stan(“final model.stan”,data = da,iter=iter, warmup=warmup, cores =4,chains = 2, thin = 1, init = “random”,control=list(max_treedepth=15))
``````
``````data {
int <lower=0> province;
int <lower=0> n_grid;
int <lower=0> N;
int <lower=0> num;
int <lower=0> number;
vector <lower=0> [province] p;
row_vector[n_grid] kernel[province];
matrix [N,number] respon;
}
parameters{
vector [num] beta;
real <lower=0> sigma0;
real <lower=0> sigma;
vector [n_grid] x;
}
transformed parameters{
real alpha = 1 / sigma;
vector [N] lambdaa;
vector [n_grid] exp_x;
vector[province] a; // `a` saved to output
exp_x = exp(x);
lambdaa = exp ((-1 * respon[ ,6:17] * beta) / sigma);
{ // `z`  not part of output
vector[province] z;
vector[province] landa;
for (k in 1 : province) {
landa[k] = kernel[k] * exp_x * p[k];
}
a = landa / sum(landa); // assign `a`
}
}
model{
target += normal_lpdf(x|0,5);
target += normal_lpdf(beta | 0, sigma0);
target += inv_gamma_lpdf(sigma0| 0.001, 0.001);
target += normal_lpdf(log (alpha) | 0, 5);
for (i in 1:N) {
vector [province] f_i;
vector [province] s_i;
for (k in 1:province){
f_i[k]= a[k]*((lambdaa [i]*alpha*respon[i,4]^(alpha-1))/((1+(lambdaa [i]*respon[i,4]^alpha))^2));
s_i[k]= a[k]*(1- ((lambdaa [i]*respon[i,4]^alpha)/ (1+(lambdaa [i]*respon[i,4]^alpha))));
}
target += (respon[i,5] *  log (f_i)) + ((1 - respon[i,5])* log(s_i));
}
}
``````

EDIT: @maxbiostat edited this post for syntax highlighting.

You’ve specified the parameter `sigma` with an `inv_gamma(0.001, 0.001)` prior, which is highly diffuse (which we don’t really recommend with Stan, as an aside):

``````target += inv_gamma_lpdf(sigma0| 0.001, 0.001);
``````

Then you’ve specified the transformed parameter `alpha` as the inverse of `sigma`:

``````  real alpha = 1 / sigma;
``````

Then you’ve assigned a prior to the log of alpha:

``````  target += normal_lpdf(log(alpha) | 0, 5);
``````

If you assign a prior to a transformation of a parameter, then you need to add a jacobian correction to the log-likelihood (more information in the manual). But it may be simpler to just encode this expected range in the prior on sigma, rather than placing a prior on two transformations (`log(1/sigma)`).

Is there a reason you needed this extra prior? Can you try the model without it?

Thank you so much for your information and help.
I want to fit a spatial log-logistic survival model to the data and wrote the likelihood function in stan as user-defined (time and censor indicator variables in the code are respon[i,4] and respon[i,5], respectively).
sigma0 is a hyperparameter and alpha = 1 / sigma is a shape parameter of the log-logistic model.
Given the large dataset and the complexity of our model, what do you suggest about the priors as well as the code modification?