Model with reduce_sum takes too long

The loop over W should go into the subject specific log lik, yes.

You must be able to do that, because right now you also have a loop over W where you fire off a reduce_sum for each evaluation. So you do not carry forward things.

I see that you do carry forward over the Ws parameters in the transformed parameters block, but that‘s fine to do before calling the big reduce_sum which handles the subjects (and within that you loop over W).

Ok! I think it is clearer now :) thanks so so much! just to make sure two things:

  • I should pass all the shared parameters without slicing, right? In the regular way one would pass them using reduce_sum.
  • Is it correct to use reduce_sum_static with grainsize=1 if I want to parallelize over N? Such that each participant will be given their own partial_sum and run on a seaprate core.

Thanks again!

Start without reduce sum first and get your program in the right structure first as outlined above.

Things which are shared should stay like that, sure.

The static thing is only needed for deterministic results. Otherwise don’t bother with it.

Once you have it in the form discussed, then maybe share the program once more.

1 Like

This would be super helpful! Many thanks.

@wds15 is this what you had in mind? I wrote in with one layer of reduce_sum. I tested it and it seemed to work better for a subset of subjects, but maybe there is yet a better way to do it? here is the short version of the code:


functions {

    real partial_sum(real[,,] all_parameters_sliced, int start, int end, matrix Al, matrix Bl, real[,,] U, real Q, matrix C, 
                            vector mu_prior_xl, real sigma_vl, real sigma_rl, int W, int Xdim, int grain,

                            int[,] idx_rdm_obsl, real[,,] RTu_rdml, real[,,] RTl_rdml, real[,,] Cohu_rdml, real[,,] Cohl_rdml,
                            int[,] Nu_rdml, int[,] Nl_rdml,

                            int[,] idx_itc_obsl, int[,,] choice_itcl, real[,,] amount_laterl, real[,,] delay_laterl, real[,,] amount_soonerl,
                            int[,] Tr_itcl,

                            int[,,] choice_ncl, int[,,] deltaMl, real[,,] TotalSl, int[,] idx_nc_obsl, int[,] Tr_ncl) {

    real lt = 0;

    for (n in 1:grain) {

        // unpack parameters
        real alpha_rdm_prl[W] = all_parameters_sliced[n,,1];
        real alpha_rdml[W] = all_parameters_sliced[n,,2];
        real beta_rdm_prl[W] = all_parameters_sliced[n,,3];
        real beta_rdml[W] = all_parameters_sliced[n,,4];
        real delta_rdm_prl[W] = all_parameters_sliced[n,,5];
        real delta_rdml[W] = all_parameters_sliced[n,,6];
        real tau_rdm_prl[W] = all_parameters_sliced[n,,7];
        real tau_rdml[W] = all_parameters_sliced[n,,8];
        real itc_k_prl[W] = all_parameters_sliced[n,,9];
        real itc_kl[W] = all_parameters_sliced[n,,10];
        real itc_beta_prl[W] = all_parameters_sliced[n,,11];
        real itc_betal[W] = all_parameters_sliced[n,,12];
        real weber_nc_prl[W] = all_parameters_sliced[n,,13];
        real weber_ncl[W] = all_parameters_sliced[n,,14];
        real Xl[W, Xdim] = all_parameters_sliced[n,,15:(15+Xdim-1)];

        int s = start + (n - 1); 

	    for (w in 1:W) {

	            if (w == 1) {                                              							// first week                                      
	                lt += normal_lpdf(to_vector(Xl[w,])|mu_prior_xl,sigma_vl);                           
	            }              
	            else { 
	                lt += normal_lpdf(to_vector(Xl[w,])|(Al * to_vector(Xl[w,]) + Bl * to_vector(U[s, w-1,])), Q);                                                          
	            }  

	        lt += normal_lpdf(weber_nc_prl[w]|C[1,] * to_vector(Xl[w,]),sigma_rl);                  // nc
	        lt += normal_lpdf(itc_k_prl[w]|C[2,] * to_vector(Xl[w,]),sigma_rl);                     // itc
	        lt += normal_lpdf(itc_beta_prl[w]|C[3,] * to_vector(Xl[w,]),sigma_rl);                  // itc
	        lt += normal_lpdf(alpha_rdm_prl[w]|C[4,] * to_vector(Xl[w,]),sigma_rl);                 // rdm    
	        lt += normal_lpdf(beta_rdm_prl[w]|C[5,] * to_vector(Xl[w,]),sigma_rl);                  // rdm
	        lt += normal_lpdf(delta_rdm_prl[w]|C[6,] * to_vector(Xl[w,]),sigma_rl);                 // rdm   
	        lt += normal_lpdf(tau_rdm_prl[w]|C[7,] * to_vector(Xl[w,]),sigma_rl);                   // rdm  

	        if (idx_nc_obsl[s,w] != 0) {                                     // if week exists

	            for (t in 1:Tr_ncl[s,w]) {
	                lt += bernoulli_lpmf(choice_ncl[s,w,t] | normal_cdf(0, deltaMl[s,w, t], weber_ncl[w] * TotalSl[s,w, t]));    
	            }   
	        }

	        if (idx_itc_obsl[s,w] != 0) {                                    // if week exists

	            vector[Tr_itcl[s,w]] ev_laterl   = to_vector(amount_laterl[s,w,:Tr_itcl[s,w]])  ./ (1 + itc_kl[w] * to_vector(delay_laterl[s,w,:Tr_itcl[s,w]])/7);
	            vector[Tr_itcl[s,w]] ev_soonerl  = to_vector(amount_soonerl[s,w,:Tr_itcl[s,w]]);
	            lt += bernoulli_logit_lpmf(choice_itcl[s,w,:Tr_itcl[s,w]]|itc_betal[w] * (ev_laterl - ev_soonerl)); 

	        } 

	        if (idx_rdm_obsl[s,w] != 0) {                                    // if week exists

	            vector[Nu_rdml[s,w]] delta_cohul;
	            vector[Nl_rdml[s,w]] delta_cohll;

	            delta_cohul = delta_rdml[w]*to_vector(Cohu_rdml[s,w,:Nu_rdml[s,w]]);            
	            lt += wiener_lpdf(RTu_rdml[s,w, :Nu_rdml[s,w]] | alpha_rdml[w], tau_rdml[w], beta_rdml[w], delta_cohul);

	            delta_cohll = delta_rdml[w]*to_vector(Cohl_rdml[s,w,:Nl_rdml[s,w]]);            
	            lt += wiener_lpdf(RTl_rdml[s,w,:Nl_rdml[s,w]] | alpha_rdml[w], tau_rdml[w], 1-beta_rdml[w], -delta_cohll);              
	        }

	    }
    }

    return lt;

    }

}

.
.
.
.

model {
    sigma_x ~ cauchy(0, cauchy_alpha); // prior on R diagonal     
    sigma_v ~ cauchy(0, cauchy_alpha); // prior on R diagonal     
    sigma_r ~ cauchy(0, cauchy_alpha); // prior on R diagonal     
    sigma_a ~ cauchy(0, cauchy_alpha); // prior on A variance  
    sigma_b ~ cauchy(0, cauchy_alpha); // prior on B variance           
    sigma_c ~ cauchy(0, cauchy_alpha); // prior on C variance    

    mu_prior_x ~ normal(0,sigma_x); // prior on X mean   

    // put priors an A, B, C
    to_vector(A) ~ normal(0,sigma_a);
    to_vector(B) ~ normal(0,sigma_b);
    to_vector(C) ~ normal(0,sigma_c);          

     {
     // pack parameters for reduce sum
     real all_parameters[N, W, 15+Xdim]; 
     for (n in 1:N) {
        for (w in 1:W) {
            all_parameters[n, w, 1] = alpha_rdm_pr[n,w];
            all_parameters[n, w, 2] = alpha_rdm[n,w];
            all_parameters[n, w, 3] = beta_rdm_pr[n,w];
            all_parameters[n, w, 4] = beta_rdm[n,w];
            all_parameters[n, w, 5] = delta_rdm_pr[n,w];
            all_parameters[n, w, 6] = delta_rdm[n,w];
            all_parameters[n, w, 7] = tau_rdm_pr[n,w];
            all_parameters[n, w, 8] = tau_rdm[n,w];
            all_parameters[n, w, 9] = itc_k_pr[n,w];
            all_parameters[n, w, 10] = itc_k[n,w];
            all_parameters[n, w, 11] = itc_beta_pr[n,w];
            all_parameters[n, w, 12] = itc_beta[n,w];
            all_parameters[n, w, 13] = weber_nc_pr[n,w];
            all_parameters[n, w, 14] = weber_nc[n,w];
            all_parameters[n, w, 15:(15+Xdim-1)] = X[n,w,];
        }
    }                                      
        target += reduce_sum(partial_sum, all_parameters, grainsize, A, B, U, Q, C, mu_prior_x, sigma_v, sigma_r, W, Xdim, grain,

                            idx_rdm_obs, RTu_rdm, RTl_rdm, Cohu_rdm, Cohl_rdm, Nu_rdm, Nl_rdm,
                            idx_itc_obs, choice_itc, amount_later, delay_later, amount_sooner, Tr_itc,
                            choice_nc, deltaM, TotalS, idx_nc_obs, Tr_nc);
}

Is there a way to make it even more efficient?

I don’t see anything odd here.

Getting more out of it will require quite some detailed dive into this.

It’s good to see that you now only have a single reduce_sum as it should be.

1 Like

Thank you!

Hi @nerpa, just curious do you see any performance improvement after packing all parameters and slice over them for each subject (for each n in 1:N) ?

Thanks @nerpa!

George

Yes, it helped a lot to iterate over subjects within reduce_sum. As @wds15 suggested, there should be only one instance of reduce_sum.