Model with reduce_sum takes too long

wds15 · August 9, 2020, 3:07pm

The loop over W should go into the subject specific log lik, yes.

You must be able to do that, because right now you also have a loop over W where you fire off a reduce_sum for each evaluation. So you do not carry forward things.

I see that you do carry forward over the Ws parameters in the transformed parameters block, but that‘s fine to do before calling the big reduce_sum which handles the subjects (and within that you loop over W).

nerpa · August 9, 2020, 3:22pm

Ok! I think it is clearer now :) thanks so so much! just to make sure two things:

I should pass all the shared parameters without slicing, right? In the regular way one would pass them using reduce_sum.
Is it correct to use reduce_sum_static with grainsize=1 if I want to parallelize over N? Such that each participant will be given their own partial_sum and run on a seaprate core.

Thanks again!

wds15 · August 9, 2020, 3:35pm

Start without reduce sum first and get your program in the right structure first as outlined above.

Things which are shared should stay like that, sure.

The static thing is only needed for deterministic results. Otherwise don’t bother with it.

Once you have it in the form discussed, then maybe share the program once more.

nerpa · August 9, 2020, 3:39pm

This would be super helpful! Many thanks.

nerpa · August 14, 2020, 9:07am

@wds15 is this what you had in mind? I wrote in with one layer of reduce_sum. I tested it and it seemed to work better for a subset of subjects, but maybe there is yet a better way to do it? here is the short version of the code:


functions {

    real partial_sum(real[,,] all_parameters_sliced, int start, int end, matrix Al, matrix Bl, real[,,] U, real Q, matrix C, 
                            vector mu_prior_xl, real sigma_vl, real sigma_rl, int W, int Xdim, int grain,

                            int[,] idx_rdm_obsl, real[,,] RTu_rdml, real[,,] RTl_rdml, real[,,] Cohu_rdml, real[,,] Cohl_rdml,
                            int[,] Nu_rdml, int[,] Nl_rdml,

                            int[,] idx_itc_obsl, int[,,] choice_itcl, real[,,] amount_laterl, real[,,] delay_laterl, real[,,] amount_soonerl,
                            int[,] Tr_itcl,

                            int[,,] choice_ncl, int[,,] deltaMl, real[,,] TotalSl, int[,] idx_nc_obsl, int[,] Tr_ncl) {

    real lt = 0;

    for (n in 1:grain) {

        // unpack parameters
        real alpha_rdm_prl[W] = all_parameters_sliced[n,,1];
        real alpha_rdml[W] = all_parameters_sliced[n,,2];
        real beta_rdm_prl[W] = all_parameters_sliced[n,,3];
        real beta_rdml[W] = all_parameters_sliced[n,,4];
        real delta_rdm_prl[W] = all_parameters_sliced[n,,5];
        real delta_rdml[W] = all_parameters_sliced[n,,6];
        real tau_rdm_prl[W] = all_parameters_sliced[n,,7];
        real tau_rdml[W] = all_parameters_sliced[n,,8];
        real itc_k_prl[W] = all_parameters_sliced[n,,9];
        real itc_kl[W] = all_parameters_sliced[n,,10];
        real itc_beta_prl[W] = all_parameters_sliced[n,,11];
        real itc_betal[W] = all_parameters_sliced[n,,12];
        real weber_nc_prl[W] = all_parameters_sliced[n,,13];
        real weber_ncl[W] = all_parameters_sliced[n,,14];
        real Xl[W, Xdim] = all_parameters_sliced[n,,15:(15+Xdim-1)];

        int s = start + (n - 1); 

	    for (w in 1:W) {

	            if (w == 1) {                                              							// first week                                      
	                lt += normal_lpdf(to_vector(Xl[w,])|mu_prior_xl,sigma_vl);                           
	            }              
	            else { 
	                lt += normal_lpdf(to_vector(Xl[w,])|(Al * to_vector(Xl[w,]) + Bl * to_vector(U[s, w-1,])), Q);                                                          
	            }  

	        lt += normal_lpdf(weber_nc_prl[w]|C[1,] * to_vector(Xl[w,]),sigma_rl);                  // nc
	        lt += normal_lpdf(itc_k_prl[w]|C[2,] * to_vector(Xl[w,]),sigma_rl);                     // itc
	        lt += normal_lpdf(itc_beta_prl[w]|C[3,] * to_vector(Xl[w,]),sigma_rl);                  // itc
	        lt += normal_lpdf(alpha_rdm_prl[w]|C[4,] * to_vector(Xl[w,]),sigma_rl);                 // rdm    
	        lt += normal_lpdf(beta_rdm_prl[w]|C[5,] * to_vector(Xl[w,]),sigma_rl);                  // rdm
	        lt += normal_lpdf(delta_rdm_prl[w]|C[6,] * to_vector(Xl[w,]),sigma_rl);                 // rdm   
	        lt += normal_lpdf(tau_rdm_prl[w]|C[7,] * to_vector(Xl[w,]),sigma_rl);                   // rdm  

	        if (idx_nc_obsl[s,w] != 0) {                                     // if week exists

	            for (t in 1:Tr_ncl[s,w]) {
	                lt += bernoulli_lpmf(choice_ncl[s,w,t] | normal_cdf(0, deltaMl[s,w, t], weber_ncl[w] * TotalSl[s,w, t]));    
	            }   
	        }

	        if (idx_itc_obsl[s,w] != 0) {                                    // if week exists

	            vector[Tr_itcl[s,w]] ev_laterl   = to_vector(amount_laterl[s,w,:Tr_itcl[s,w]])  ./ (1 + itc_kl[w] * to_vector(delay_laterl[s,w,:Tr_itcl[s,w]])/7);
	            vector[Tr_itcl[s,w]] ev_soonerl  = to_vector(amount_soonerl[s,w,:Tr_itcl[s,w]]);
	            lt += bernoulli_logit_lpmf(choice_itcl[s,w,:Tr_itcl[s,w]]|itc_betal[w] * (ev_laterl - ev_soonerl)); 

	        } 

	        if (idx_rdm_obsl[s,w] != 0) {                                    // if week exists

	            vector[Nu_rdml[s,w]] delta_cohul;
	            vector[Nl_rdml[s,w]] delta_cohll;

	            delta_cohul = delta_rdml[w]*to_vector(Cohu_rdml[s,w,:Nu_rdml[s,w]]);            
	            lt += wiener_lpdf(RTu_rdml[s,w, :Nu_rdml[s,w]] | alpha_rdml[w], tau_rdml[w], beta_rdml[w], delta_cohul);

	            delta_cohll = delta_rdml[w]*to_vector(Cohl_rdml[s,w,:Nl_rdml[s,w]]);            
	            lt += wiener_lpdf(RTl_rdml[s,w,:Nl_rdml[s,w]] | alpha_rdml[w], tau_rdml[w], 1-beta_rdml[w], -delta_cohll);              
	        }

	    }
    }

    return lt;

    }

}

.
.
.
.

model {
    sigma_x ~ cauchy(0, cauchy_alpha); // prior on R diagonal     
    sigma_v ~ cauchy(0, cauchy_alpha); // prior on R diagonal     
    sigma_r ~ cauchy(0, cauchy_alpha); // prior on R diagonal     
    sigma_a ~ cauchy(0, cauchy_alpha); // prior on A variance  
    sigma_b ~ cauchy(0, cauchy_alpha); // prior on B variance           
    sigma_c ~ cauchy(0, cauchy_alpha); // prior on C variance    

    mu_prior_x ~ normal(0,sigma_x); // prior on X mean   

    // put priors an A, B, C
    to_vector(A) ~ normal(0,sigma_a);
    to_vector(B) ~ normal(0,sigma_b);
    to_vector(C) ~ normal(0,sigma_c);          

     {
     // pack parameters for reduce sum
     real all_parameters[N, W, 15+Xdim]; 
     for (n in 1:N) {
        for (w in 1:W) {
            all_parameters[n, w, 1] = alpha_rdm_pr[n,w];
            all_parameters[n, w, 2] = alpha_rdm[n,w];
            all_parameters[n, w, 3] = beta_rdm_pr[n,w];
            all_parameters[n, w, 4] = beta_rdm[n,w];
            all_parameters[n, w, 5] = delta_rdm_pr[n,w];
            all_parameters[n, w, 6] = delta_rdm[n,w];
            all_parameters[n, w, 7] = tau_rdm_pr[n,w];
            all_parameters[n, w, 8] = tau_rdm[n,w];
            all_parameters[n, w, 9] = itc_k_pr[n,w];
            all_parameters[n, w, 10] = itc_k[n,w];
            all_parameters[n, w, 11] = itc_beta_pr[n,w];
            all_parameters[n, w, 12] = itc_beta[n,w];
            all_parameters[n, w, 13] = weber_nc_pr[n,w];
            all_parameters[n, w, 14] = weber_nc[n,w];
            all_parameters[n, w, 15:(15+Xdim-1)] = X[n,w,];
        }
    }                                      
        target += reduce_sum(partial_sum, all_parameters, grainsize, A, B, U, Q, C, mu_prior_x, sigma_v, sigma_r, W, Xdim, grain,

                            idx_rdm_obs, RTu_rdm, RTl_rdm, Cohu_rdm, Cohl_rdm, Nu_rdm, Nl_rdm,
                            idx_itc_obs, choice_itc, amount_later, delay_later, amount_sooner, Tr_itc,
                            choice_nc, deltaM, TotalS, idx_nc_obs, Tr_nc);
}

Is there a way to make it even more efficient?

wds15 · August 14, 2020, 9:39am

I don’t see anything odd here.

Getting more out of it will require quite some detailed dive into this.

It’s good to see that you now only have a single reduce_sum as it should be.

nerpa · August 14, 2020, 10:04am

Thank you!

George_Seah · December 28, 2020, 12:42am

Hi @nerpa, just curious do you see any performance improvement after packing all parameters and slice over them for each subject (for each n in 1:N) ?

Thanks @nerpa!

George

nerpa · December 28, 2020, 12:53am

Yes, it helped a lot to iterate over subjects within reduce_sum. As @wds15 suggested, there should be only one instance of reduce_sum.

Topic		Replies	Views
Trying within-chain parallelization with reduce_sum increases runtime a lot Modeling paralellization	6	952	August 2, 2021
Help with reduce_sum Modeling	32	1402	August 4, 2020
Reduce_sum parallelisation issue Modeling cmdstanr , multivariate-normal	12	1032	February 24, 2022
Can't get reduce_sum to help with model runtime Modeling	7	345	December 19, 2023
Reduce_sum performance Modeling performance , paralellization	15	1342	September 28, 2020

Model with reduce_sum takes too long

Related topics