# How to use reduce_sum in for loops?

Hi, there. I’m a beginner of stan. Now I face a problem and I have no idea to figure it out.

I have a model, looks like:

``````data{
int<lower=0> N;//number of participants
int<lower=0> T;//number of trials
array[N,T] int<lower=0,upper=1> Z; //participants' choices
array[N,T] real<lower=0> S; //conditions
array[N,T] real<lower=0> O; //conditions
array[N,T] real<lower=0> P; //conditions
}
model {
// omit something unimportant

for (i in 1:N){
real X;
real Y;

for (t in 1:T){
X = alpha[i] * S[i,t] + beta[i] * O[i,t] - gamma[i] * P[i,t];
Y = 0;

Z[i,t] ~ bernoulli_logit(IT[i] * (X-Y));
}
}
}
``````

I want to rewrite these code by using reduce_sum. But I have no idea to solve these nested loop. Is there anyone could give me some advices? Thanks a lot~

Update: Here is the code I rewrite for multithread, unfortuntely, the output is different with single thread code.

Here is my code with reduce_sum:

``````functions {
real partial_sum_lpmf(array[,] int Z_slice, int start, int end,
vector alpha, vector beta, vector gamma, vector IT,
array[,] real S, array[,] real O, array[,] real P
) {
real partial_log_lik = 0;

int N;
for (n in 1:N) {
real X;
real Y;

for (t in 1:end-start+1) {
X = alpha[n] * S[n, start+t-1] + beta[n] * O[n, start+t-1] - gamma[n] * P[n, start+t-1];
Y = 0;
partial_log_lik += bernoulli_logit_lupmf(Z_slice[n,t] | IT[n] * (X - Y));
}
}
return partial_log_lik;

}
}

model {
// omit something unimportant

target += reduce_sum (partial_sum_lpmf, Z, 1, alpha, beta, gamma, IT, S, O, P)
}
``````

I used `summarise_draws` by `library(posterior)`, here is the results

``````variable        mean    median      sd    mad      q5    q95  rhat ess_bulk
<chr>          <num>     <num>   <num>  <num>   <num>  <num> <num>    <num>
1 lp__       -154.     -107.     131.    12.9   -524.   -89.0   1.39     17.9
2 mu_p[1]      -0.0226    0.0627   0.786  0.622   -1.31   1.27  1.42     43.2
3 mu_p[2]      -0.0594   -0.113    0.700  0.216   -1.25   1.28  1.51    224.
4 mu_p[3]       0.160    -0.0182   0.918  0.599   -1.24   1.83  1.60     23.2
5 mu_p[4]      -0.208    -0.135    1.27   1.16    -2.42   1.87  1.39     18.0
6 sigma[1]      1.59      0.0781  20.6    2.83   -11.9   13.3   1.30     58.8
7 sigma[2]      1.40     -0.862   22.9    0.687  -11.7   15.8   1.64    140.
8 sigma[3]     -0.454    -1.68    37.0    3.15   -16.2   15.5   1.55     64.6
9 sigma[4]     -2.67     -0.222   25.8    3.42   -18.1   14.0   1.39     73.6
10 alpha_p[1]   -0.0108   -0.0174   0.996  1.20    -1.69   1.39  1.32     19.7
``````

`rhats` are larger than 1.01 means the check isn’t pass.

P.S. The sample code I used in `cmdstanr` :

``````fit_mcmc <- fit_output_reduce_sum\$sample(
data = dataList,
seed = 123,
chains = 4,
parallel_chains = 4,
init = 1,
iter_sampling = 500,
iter_warmup = 500
)
``````

P.S.S The results from stan with reduce_sum

``````# This is the results by using reduce_sum
variable    mean  median    sd   mad      q5    q95 rhat ess_bulk ess_tail
lp__       -107.20 -106.79 10.56 10.26 -125.17 -90.32 1.01      620      708
mu_p[1]      -0.02    0.01  1.02  1.02   -1.70   1.63 1.00     3252     1423
mu_p[2]       0.01    0.02  0.98  0.97   -1.60   1.66 1.00     3641     1523
mu_p[3]      -0.01   -0.02  0.96  0.94   -1.60   1.61 1.00     3595     1667
mu_p[4]       0.01    0.03  1.00  0.94   -1.64   1.71 1.00     3943     1383
sigma[1]      3.18    0.08 28.96  6.32  -20.45  27.50 1.03      271       67
sigma[2]      3.68   -0.06 32.18  6.23  -22.85  41.77 1.01      886      291
sigma[3]      0.65   -0.31 52.27  7.58  -27.59  29.86 1.01      518      293
sigma[4]     -4.22   -0.25 36.33  7.83  -33.41  23.59 1.02      440      128
alpha_p[1]    0.00   -0.01  0.96  0.95   -1.54   1.57 1.00     3009     1214
``````
``````# This is the results without reduce_sum
variable    mean  median     sd   mad      q5    q95 rhat ess_bulk ess_tail
lp__       -200.75 -106.65 172.62 16.79 -553.88 -88.16 1.70        6       10
mu_p[1]      -0.03    0.08   0.45  0.36   -0.85   0.50 3.28        4       14
mu_p[2]      -0.13   -0.12   0.09  0.08   -0.32   0.00 1.36        9       12
mu_p[3]       0.33   -0.02   0.83  0.22   -0.50   1.88 2.17        5       12
mu_p[4]      -0.43   -0.29   1.46  1.56   -2.48   2.17 1.90        5       15
sigma[1]      0.00    0.07   1.48  1.89   -2.23   1.83 3.24        4       14
sigma[2]     -0.88   -0.94   0.26  0.29   -1.26  -0.41 2.02        5       16
sigma[3]     -1.55   -1.84   1.56  1.49   -3.50   0.97 2.65        4       11
sigma[4]     -1.13   -0.21   2.41  1.25   -5.91   1.65 1.74        6       11
alpha_p[1]   -0.02   -0.03   1.03  1.44   -1.79   1.24 1.98        5       41
``````

If you are looking for a speed up, I would transform your `S`, `O`, and `P` data matrices into one large matrices of size N * T by 3 (so you are flattening each of them into vectors and combining the three into a matrix), and change alpha, beta, and gamma parameters (which would be defined in your parameter block) into a matrix of i participants by 3.

Then, you can use the more efficient `bernoulli_logit_glm`, see 15.3 Bernoulli-logit generalized linear model (Logistic Regression) | Stan Functions Reference. (alpha for the function is just a vector of zeros since you didn’t define your alpha as an intercept).

If that still isn’t fast enough, then you can slice over the parameters by participants using `reduce_sum`.

Thank for your advice. I want to use reduce_sum to handle it, and I want to split the participants to parallizing the N (participants). Could you give me some advices? Tks~

Sure. Ok, so based on your initial code structure (I haven’t had time to read through your other updates), you basically move everything inside the outside for loop (i in 1:N) into a partial sum function, and slice the index i 1:N.

Sort of like an apply in R.

You’ll need to follow the syntax changes needed by partial sum in the docs.

Thank for your time and your reply! I have tried your suggestion and the reduce_sum worked. However, the speed up a little (10% perhaps, with AMD 7950, 16 cores and 32 threads.)

I posted my full code before I modified. Appreciate any suggestion to speed up this code!

It’s a code for UG game, and modified from hBayesDM code: hBayesDM/commons/stan_files/ug_delta.stan at develop · CCS-Lab/hBayesDM · GitHub

``````data {
int<lower=0> N; //number of participants
int<lower=0> T;// number of trials for each participants
array[N,T] int<lower=0,upper=1> Choice;
array[N,T] real<lower=0> benefitS;
array[N,T] real<lower=0> benefitO;
array[N,T] real<lower=0> lost;
}

parameters {
vector[4] mu_p;
vector[4] sigma;

vector[N] alpha;
vector[N] beta;
vector[N] gamma;
vector[N] IT;
}

model {
mu_p ~ normal(0,1);
sigma ~ cauchy(0,5);

alpha_p ~ normal(0,1);
beta_p ~ normal(0,1);
gamma_P ~ normal(0,1);
IT_p ~ normal(0,1);

for (i in 1:N){
real accept;
real reject;

for (t in 1:T){
accept = alpha[i]*benefitS[i,t]+beta[i]*benefitO[i,t]-gamma[i]*lost[i,t];
reject=0;

BribeChoice[i,t]~bernoulli_logit(IT[i]*(accept-reject));
}
}
}

generated quantities{
real<lower=-10,upper=10> mu_alpha;
real<lower=-10,upper=10> mu_beta;
real<lower=-10,upper=10> mu_gamma;
real<lower=0,upper=10> mu_IT;

array[N,T] real accept;

array[N]real log_lik;

real <lower=0,upper=1> prob;
array[N,T] int BG_pred;

mu_alpha = Phi_approx(mu_p[1])*20-10;
mu_beta = Phi_approx(mu_p[2])*20-10;
mu_gamma = Phi_approx(mu_p[3])*20-10;
mu_IT = Phi_approx(mu_p[4])*10;

accept=rep_array(-9999.0,N,T);
BG_pred=rep_array(-999,N,T);

{
for (i in 1:N){
log_lik[i]=0;
for (t in 1:T){
accept[i,t]=alpha[i]*benefitS[i,t]+beta[i]*benefitO[i,t]-gamma[i]*lost[i,t];

log_lik[i]=log_lik[i]+bernoulli_logit_lpmf(Choice[i,t]|IT[i]*(accept[i,t]-0));

prob=inv_logit(IT[i]*(accept[i,t]-0));

BG_pred[i,t]=bernoulli_rng(prob);
}
}
}

}``````

I’m a little confused on what code you are referring to at this point as your latest post does not seem to include what you said you modified.

For speed up, my earlier suggestion should give you more, then if you wanted to go further reduce sum may make more sense.

Also, speed up on reduce sum depends on several factors, including the way you code it and the way you parameterize the slicing over cores and number of cores, etc. While your machine is 16 cores, it will make a large difference whether those are physical cores or virtual cores.

Oh! Sorry for forget post my modified code:

``````functions {
real partial_sum (array[,] int Choice,
int start, int end,
array[,] real benefitS, array[,] real benefitO, array[,] real lost,
vector alpha, vector beta, vector gamma, vector IT,int T) {

real partial_log_lik  = 0;

for (n in 1:end-start+1) {
real accept;
real reject;
for (t in 1:T) {

accept = alpha[start + n - 1]*benefitS[start + n - 1,t]+beta[start + n - 1]*benefitO[start + n - 1,t]-gamma[start + n - 1]*lost[start + n - 1,t];
reject = 0;
partial_log_lik += bernoulli_logit_lpmf(Choice[n,t] | IT[start + n - 1] * (U_accept - U_reject));
}
}
return partial_log_lik;
}
}

//[skipped some part with above]
model {
mu_p ~ normal(0,1);
sigma ~ cauchy(0,5);

alpha_p ~ normal(0,1);
beta_p ~ normal(0,1);
gamma_P ~ normal(0,1);
IT_p ~ normal(0,1);

target += reduce_sum_static(partial_sum,Choice,1,
benefitS,benefitO,lost,
alpha, beta, gamma, IT,T);
}
``````

Apologize for my careless and thank for your patience!