How to use reduce_sum in for loops?

Tong_Zhao · January 7, 2024, 4:47pm

Hi, there. I’m a beginner of stan. Now I face a problem and I have no idea to figure it out.

I have a model, looks like:

data{
  int<lower=0> N;//number of participants
  int<lower=0> T;//number of trials
  array[N,T] int<lower=0,upper=1> Z; //participants' choices
  array[N,T] real<lower=0> S; //conditions
  array[N,T] real<lower=0> O; //conditions
  array[N,T] real<lower=0> P; //conditions
}
model {
// omit something unimportant 

for (i in 1:N){
    real X;
    real Y;
    
    for (t in 1:T){
      X = alpha[i] * S[i,t] + beta[i] * O[i,t] - gamma[i] * P[i,t];
      Y = 0;
      
      Z[i,t] ~ bernoulli_logit(IT[i] * (X-Y));
    }
  }
}

I want to rewrite these code by using reduce_sum. But I have no idea to solve these nested loop. Is there anyone could give me some advices? Thanks a lot~

Tong_Zhao · January 8, 2024, 8:20am

Update: Here is the code I rewrite for multithread, unfortuntely, the output is different with single thread code.

Here is my code with reduce_sum:

functions {
  real partial_sum_lpmf(array[,] int Z_slice, int start, int end,
                   vector alpha, vector beta, vector gamma, vector IT,
                   array[,] real S, array[,] real O, array[,] real P
                   ) {
    real partial_log_lik = 0;

    int N;
    for (n in 1:N) {
      real X;
      real Y;
      
      for (t in 1:end-start+1) {
        X = alpha[n] * S[n, start+t-1] + beta[n] * O[n, start+t-1] - gamma[n] * P[n, start+t-1];
        Y = 0;
        partial_log_lik += bernoulli_logit_lupmf(Z_slice[n,t] | IT[n] * (X - Y));
      }
    }
    return partial_log_lik;
    
  }
}

model {
// omit something unimportant

target += reduce_sum (partial_sum_lpmf, Z, 1, alpha, beta, gamma, IT, S, O, P)
}

I used summarise_draws by library(posterior), here is the results

variable        mean    median      sd    mad      q5    q95  rhat ess_bulk
   <chr>          <num>     <num>   <num>  <num>   <num>  <num> <num>    <num>
 1 lp__       -154.     -107.     131.    12.9   -524.   -89.0   1.39     17.9
 2 mu_p[1]      -0.0226    0.0627   0.786  0.622   -1.31   1.27  1.42     43.2
 3 mu_p[2]      -0.0594   -0.113    0.700  0.216   -1.25   1.28  1.51    224. 
 4 mu_p[3]       0.160    -0.0182   0.918  0.599   -1.24   1.83  1.60     23.2
 5 mu_p[4]      -0.208    -0.135    1.27   1.16    -2.42   1.87  1.39     18.0
 6 sigma[1]      1.59      0.0781  20.6    2.83   -11.9   13.3   1.30     58.8
 7 sigma[2]      1.40     -0.862   22.9    0.687  -11.7   15.8   1.64    140. 
 8 sigma[3]     -0.454    -1.68    37.0    3.15   -16.2   15.5   1.55     64.6
 9 sigma[4]     -2.67     -0.222   25.8    3.42   -18.1   14.0   1.39     73.6
10 alpha_p[1]   -0.0108   -0.0174   0.996  1.20    -1.69   1.39  1.32     19.7

rhats are larger than 1.01 means the check isn’t pass.

Tong_Zhao · January 8, 2024, 8:24am

P.S. The sample code I used in cmdstanr :

fit_mcmc <- fit_output_reduce_sum$sample(
    data = dataList,
    seed = 123,
    chains = 4,
    parallel_chains = 4,
    init = 1,
    iter_sampling = 500,
    threads_per_chain = 4,
    iter_warmup = 500
)

P.S.S The results from stan with reduce_sum

# This is the results by using reduce_sum
   variable    mean  median    sd   mad      q5    q95 rhat ess_bulk ess_tail
 lp__       -107.20 -106.79 10.56 10.26 -125.17 -90.32 1.01      620      708
 mu_p[1]      -0.02    0.01  1.02  1.02   -1.70   1.63 1.00     3252     1423
 mu_p[2]       0.01    0.02  0.98  0.97   -1.60   1.66 1.00     3641     1523
 mu_p[3]      -0.01   -0.02  0.96  0.94   -1.60   1.61 1.00     3595     1667
 mu_p[4]       0.01    0.03  1.00  0.94   -1.64   1.71 1.00     3943     1383
 sigma[1]      3.18    0.08 28.96  6.32  -20.45  27.50 1.03      271       67
 sigma[2]      3.68   -0.06 32.18  6.23  -22.85  41.77 1.01      886      291
 sigma[3]      0.65   -0.31 52.27  7.58  -27.59  29.86 1.01      518      293
 sigma[4]     -4.22   -0.25 36.33  7.83  -33.41  23.59 1.02      440      128
 alpha_p[1]    0.00   -0.01  0.96  0.95   -1.54   1.57 1.00     3009     1214

# This is the results without reduce_sum
  variable    mean  median     sd   mad      q5    q95 rhat ess_bulk ess_tail
 lp__       -200.75 -106.65 172.62 16.79 -553.88 -88.16 1.70        6       10
 mu_p[1]      -0.03    0.08   0.45  0.36   -0.85   0.50 3.28        4       14
 mu_p[2]      -0.13   -0.12   0.09  0.08   -0.32   0.00 1.36        9       12
 mu_p[3]       0.33   -0.02   0.83  0.22   -0.50   1.88 2.17        5       12
 mu_p[4]      -0.43   -0.29   1.46  1.56   -2.48   2.17 1.90        5       15
 sigma[1]      0.00    0.07   1.48  1.89   -2.23   1.83 3.24        4       14
 sigma[2]     -0.88   -0.94   0.26  0.29   -1.26  -0.41 2.02        5       16
 sigma[3]     -1.55   -1.84   1.56  1.49   -3.50   0.97 2.65        4       11
 sigma[4]     -1.13   -0.21   2.41  1.25   -5.91   1.65 1.74        6       11
 alpha_p[1]   -0.02   -0.03   1.03  1.44   -1.79   1.24 1.98        5       41

ssp3nc3r · January 8, 2024, 8:07pm

If you are looking for a speed up, I would transform your S, O, and P data matrices into one large matrices of size N * T by 3 (so you are flattening each of them into vectors and combining the three into a matrix), and change alpha, beta, and gamma parameters (which would be defined in your parameter block) into a matrix of i participants by 3.

Then, you can use the more efficient bernoulli_logit_glm, see 15.3 Bernoulli-logit generalized linear model (Logistic Regression) | Stan Functions Reference. (alpha for the function is just a vector of zeros since you didn’t define your alpha as an intercept).

If that still isn’t fast enough, then you can slice over the parameters by participants using reduce_sum.

Tong_Zhao · January 9, 2024, 8:13am

Thank for your advice. I want to use reduce_sum to handle it, and I want to split the participants to parallizing the N (participants). Could you give me some advices? Tks~

ssp3nc3r · January 9, 2024, 2:55pm

Sure. Ok, so based on your initial code structure (I haven’t had time to read through your other updates), you basically move everything inside the outside for loop (i in 1:N) into a partial sum function, and slice the index i 1:N.

Sort of like an apply in R.

You’ll need to follow the syntax changes needed by partial sum in the docs.

Tong_Zhao · January 10, 2024, 2:47pm

Thank for your time and your reply! I have tried your suggestion and the reduce_sum worked. However, the speed up a little (10% perhaps, with AMD 7950, 16 cores and 32 threads.)

I posted my full code before I modified. Appreciate any suggestion to speed up this code!

It’s a code for UG game, and modified from hBayesDM code: hBayesDM/commons/stan_files/ug_delta.stan at develop · CCS-Lab/hBayesDM · GitHub

data {
  int<lower=0> N; //number of participants
  int<lower=0> T;// number of trials for each participants
  array[N,T] int<lower=0,upper=1> Choice;
  array[N,T] real<lower=0> benefitS;
  array[N,T] real<lower=0> benefitO;
  array[N,T] real<lower=0> lost;
}


parameters {
  vector[4] mu_p;
  vector[4] sigma;
  
  vector[N] alpha;
  vector[N] beta;
  vector[N] gamma;  
  vector[N] IT;
}


model {
  mu_p ~ normal(0,1);
  sigma ~ cauchy(0,5);
  
  alpha_p ~ normal(0,1);
  beta_p ~ normal(0,1);
  gamma_P ~ normal(0,1);
  IT_p ~ normal(0,1);
  
  for (i in 1:N){
    real accept;
    real reject;
    
    for (t in 1:T){
      accept = alpha[i]*benefitS[i,t]+beta[i]*benefitO[i,t]-gamma[i]*lost[i,t];
      reject=0;
      
      BribeChoice[i,t]~bernoulli_logit(IT[i]*(accept-reject));
    }
  }
}


generated quantities{
  real<lower=-10,upper=10> mu_alpha; 
  real<lower=-10,upper=10> mu_beta; 
  real<lower=-10,upper=10> mu_gamma; 
  real<lower=0,upper=10> mu_IT;
  
  array[N,T] real accept;
  
  array[N]real log_lik;
  
  real <lower=0,upper=1> prob;
  array[N,T] int BG_pred;
  
  mu_alpha = Phi_approx(mu_p[1])*20-10;
  mu_beta = Phi_approx(mu_p[2])*20-10;
  mu_gamma = Phi_approx(mu_p[3])*20-10;
  mu_IT = Phi_approx(mu_p[4])*10;  
  
  
  accept=rep_array(-9999.0,N,T);
  BG_pred=rep_array(-999,N,T);
  
  
  {
    for (i in 1:N){
      log_lik[i]=0;
      for (t in 1:T){
        accept[i,t]=alpha[i]*benefitS[i,t]+beta[i]*benefitO[i,t]-gamma[i]*lost[i,t];
      
        log_lik[i]=log_lik[i]+bernoulli_logit_lpmf(Choice[i,t]|IT[i]*(accept[i,t]-0));
        
        prob=inv_logit(IT[i]*(accept[i,t]-0));
        
        BG_pred[i,t]=bernoulli_rng(prob);
      }
    }
  }
  
}

ssp3nc3r · January 10, 2024, 3:09pm

I’m a little confused on what code you are referring to at this point as your latest post does not seem to include what you said you modified.

For speed up, my earlier suggestion should give you more, then if you wanted to go further reduce sum may make more sense.

Also, speed up on reduce sum depends on several factors, including the way you code it and the way you parameterize the slicing over cores and number of cores, etc. While your machine is 16 cores, it will make a large difference whether those are physical cores or virtual cores.

Tong_Zhao · January 10, 2024, 3:24pm

Oh! Sorry for forget post my modified code:

functions {
    real partial_sum (array[,] int Choice,
                           int start, int end,
                           array[,] real benefitS, array[,] real benefitO, array[,] real lost,
                           vector alpha, vector beta, vector gamma, vector IT,int T) {

            real partial_log_lik  = 0;
            
            for (n in 1:end-start+1) {
                real accept;
                real reject;
                for (t in 1:T) {
                    
                    accept = alpha[start + n - 1]*benefitS[start + n - 1,t]+beta[start + n - 1]*benefitO[start + n - 1,t]-gamma[start + n - 1]*lost[start + n - 1,t];
                    reject = 0;
                    partial_log_lik += bernoulli_logit_lpmf(Choice[n,t] | IT[start + n - 1] * (U_accept - U_reject));
                }
            }
            return partial_log_lik;           
    }
}

//[skipped some part with above]
model {
  mu_p ~ normal(0,1);
  sigma ~ cauchy(0,5);
  
  alpha_p ~ normal(0,1);
  beta_p ~ normal(0,1);
  gamma_P ~ normal(0,1);
  IT_p ~ normal(0,1);
  

  target += reduce_sum_static(partial_sum,Choice,1,
                       benefitS,benefitO,lost,
                       alpha, beta, gamma, IT,T);
}

Apologize for my careless and thank for your patience!

Topic		Replies	Views
If I can spped up base my reduce sum stan code Modeling rstan	1	240	July 17, 2023
For-loop syntax inside of reduce_sum Modeling techniques , performance	3	41	October 25, 2024
Reduce_sum() no time saving for multilevel model Modeling techniques	1	448	December 24, 2020
Reduce_sum function in stan Modeling	5	1002	September 27, 2023
Multilevel hurdle model -- no performance increase with reduce_sum() Modeling performance	5	538	September 12, 2021

How to use reduce_sum in for loops?

Related topics