Hi everyone, I am a Stan newbie.

I need to fit a hierarchical survival model using RSTAN for large data. But, I found sampling is very time-consuming (takes ~ 7 days even for 2000 iteration/500 warmup).

To improve sampling speed, I am going to use more than one CPU per chain and have within-chain parallelization in Stan. Could someone help me to do this within-chain parallelization, please?

My stan code is as below (N=35000, n_grid=350, province=31, number=17,num=12).

In my codes, x[j]'s (j=1,2, …, n_grid) are white noise and It is used only in the calculation of a and there is no need to estimate it.

Thank you in advance!

```
fit < - stan(“final model.stan”,data = da,iter=iter, warmup=warmup, cores =4, thin = 1, init = “random”)
```

```
### Stan codes
data {
int <lower=0> province;
int <lower=0> n_grid;
int <lower=0> N;
int <lower=0> num;
int <lower=0> number;
vector <lower=0> [province] p;
row_vector[n_grid] kernel[province];
matrix [N,number] respon;
}
parameters{
vector [num] beta;
real <lower=0> sigma;
vector [n_grid] x;
}
transformed parameters{
real alpha = 1 / sigma;
vector [num] time_ratio;
vector [N] lambdaa;
vector [n_grid] exp_x;
vector[province] a; // `a` saved to output
time_ratio = exp (beta);
exp_x = exp(x);
lambdaa = exp ((-1 * respon[,6:17] * beta) / sigma);
{ // `z` is not part of output
vector[province] z;
vector[province] landa;
for (k in 1 : province) {
landa[k] = kernel[k] * exp_x * p[k];
}
a = landa / sum(landa); // assign `a`
}
}
model{
real log_alpha;
log_alpha = log (alpha);
target += normal_lpdf(x|0,2);
target += normal_lpdf(beta | 0, 1000);
target += normal_lpdf(log_alpha | 0, 5);
for (i in 1:N) {
vector [province] f_i;
vector [province] s_i;
for (k in 1:province){
f_i[k]= a[k]*((lambdaa [i]*alpha*respon[i,4]^(alpha-1))/((1+(lambdaa [i]*respon[i,4]^alpha))^2));
s_i[k]= a[k]*(1- ((lambdaa [i]*respon[i,4]^alpha)/ (1+(lambdaa [i]*respon[i,4]^alpha))));
}
target += respon[i,5] * (log (f_i)-log(s_i)) + log(s_i)- log_alpha;
}
}
```