Reduce_sum function in stan

Dear all,

I’m new to Stan. The following paragraph is my Stan code for the hierarchical survival model. But I had trouble running my model (about ~ 8 days for iter=10000, warm up=5000, N=35000). Now, I want to run my model by with-in chain parallelization method with reduce_sum (25.1 Reduce-sum | Stan User’s Guide).

How can I run my model using this method? Can anyone help me to change my codes in RSTAN?

Thank you in advance!

My Stan code is as below (N=35000, province=31, num=12):


'''
functions{
   real icar_normal_lpdf(vector phi, int province, int[] node1, int[] node2) {
    return -0.5 * dot_self(phi[node1] - phi[node2]);
  }
  // defines the log survival
  vector log_S (vector t, real shape,vector scale){
    vector[num_elements(t)] log_S;
    for (i in 1:num_elements(t)){
      log_S[i] =  log(1- ((scale [i]*(t[i])^shape)/ (1+(scale [i]*(t[i])^shape)))); // county is a vector that shows the location of patients with dim = num_elements(t) 
    }
    return log_S;
  }
  //defines the log pdf
  vector log_f (vector t,real shape,vector scale){
    vector[num_elements(t)] log_f;
    for (i in 1:num_elements(t)){
      log_f[i] =log((scale[i]*shape*t[i]^(shape-1))/((1+(scale[i]*t[i]^shape))^2));
    }
    return log_f;
  }
  //defines the log-likelihood for right censored data
  real surv_loglogistic_lpdf(vector t,vector d, real shape,vector scale){
    vector[num_elements(t)] log_lik;
    real prob;
    log_lik = d .* log_f(t, shape, scale)+ (1 - d) .* log_S(t, shape, scale);
    prob = sum(log_lik);
    return prob;
  }
   real partial_sum(array[] int y_slice,
                   int start, int end,vector d,
                   real shape, vector scale) {
    return surv_loglogistic_lpdf(y_slice | d, shape, scale);
  }
}
//data block
data{
  int <lower=0> province;
  int <lower=0> N;
  int <lower=0> num;
  int <lower=0> number;
  int <lower =0 , upper = province> county[N];
  matrix [N,number] respon;
   int<lower=0> N_edges;
  int<lower=1, upper=province> node1[N_edges];  // node1[i] adjacent to node2[i]
  int<lower=1, upper=province> node2[N_edges];  // and node1[i] < node2[i]
}
//parameters block
parameters{
  vector [num] beta;
  real <lower=0> sigma;//scale parameter sigma=1/shape
  vector <lower=0> [province] sigma1;
  vector[province] w1;  // random effect (CAR)
  vector[province] theta;       // heterogeneous effects  
}
// transformed parameters block
transformed parameters{
  vector [num] time_ratio;
  //vector[N] linpred;
  vector[N] lambdaa;
  real <lower=0> alpha = 1 / sigma;
time_ratio  = exp(beta);
  lambdaa = exp ((-respon[,7:18] * beta - w1[county] - theta[county]) / sigma);
}
// model block
model{
  target += normal_lpdf(beta | 0, 1000);
  //target += student_t_lpdf(sigma| 3,0,1); 
  target += gamma_lpdf(alpha | 0.001, 0.001);
   
   w1 ~ icar_normal_lpdf(province, node1, node2);
 
  sum(w1) ~ normal(0, 0.001 * province);
   

   
    target += normal_lpdf(theta| 0, sigma1);
        target += student_t_lpdf(sigma1| 3,0,1); 


  respon[,4] ~ surv_loglogistic//model for the data
}
// generated quantities block
generated quantities{
  vector[N] log_lik;//log-likelihood
  
  { for(n in 1:N){
    log_lik[n] = respon[n,6] * (log((exp ((-respon[n,7:18] * beta - w1[county[n]] - theta[county[n]])*alpha)*alpha*respon[n,4]^(alpha-1)) / ((1+(exp ((-respon[n,7:18] * beta - w1[county[n]] - theta[county[n]])*alpha)*(respon[n,4])^(alpha)))^2))) + (1-respon[n,6]) * ( log(1- ((exp ((-respon[n,7:18] * beta - w1[county[n]] - theta[county[n]])*alpha)*(respon[n,4])^alpha) / (1+(exp ((-respon[n,7:18] * beta - w1[county[n]] - theta[county[n]])*alpha)*(respon[n,4])^alpha)))));}
  }
}
'''

Hello,

I cleaned up your post so the model code was easier to read. A few things that will help folks answer your question:
What version of Stan are you running?
Is this in R or python?
Can you provide some simulated data (small data set) or a snippet of real data to play with?
Did you run simulated data with known parameters and did the model recover those parameters?

Thanks!

Sorry for the long delay, and thanks for your reply. Actually, I’m new to rstan, and I installed the latest version of R. Now, my model recovers the parameters well. But, I had a problem in within-chain parallelization of my codes in RSTAN because the run of it for my real data longs about 7 days(N=35000).
Can you help me to solve this issue?

Thank you in advance!

Rstan does not support reduce sum s you need at least Stan version 2.24 (I think)…so use cmdstanr and install cmdstan 2.28.2.

RStan 2.26 is on CRAN now and it includes reduce_sum!

Yes it does. v2.26.23 has a startup message that gives you tips on how to set options when using reduce_sum"

> library(rstan)
Loading required package: StanHeaders

rstan version 2.26.23 (Stan version 2.26.1)

For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores()).
To avoid recompilation of unchanged Stan programs, we recommend calling
rstan_options(auto_write = TRUE)
For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
change `threads_per_chain` option:
rstan_options(threads_per_chain = 1)