RStan parallelising using reduce_sum()

Ryuzaki · July 30, 2021, 1:16pm

Hi there, I am a total newbie to Stan and I was wondering if anybody had any ideas on how to parallelise this code by David Phillipo to run over multiple cores? The issue I have is that the models run extremely slowly and often we have to batch process 50+ of these which soon becomes unmanageable. I wish I could share an attempt but I’m not even really sure where to begin other than some sense that I should be using reduce_sum(). There is a link to the code below and I’ve pasted the section of the code that I think can be parallelised underneath. Any help gratefully received, even if it’s just a hint to get started.

Link to code: multinma/binomial_2par.stan at master · dmphillippo/multinma · GitHub

Code section that I think can be parallelised:

 // -- IPD likelihood --
  if (link == 1) { // logit link
    // Could replace with bernoulli_logit_glm in Stan > 2.20
    ipd_r ~ bernoulli_logit(eta_ipd);
  } else {
    ipd_r ~ bernoulli(theta_ipd);
  }

  // -- AgD likelihood (arm-based) --
  // We have to hand code the log likelihood contribution for the adjusted
  // binomial here, as N is not necessarily an integer (which Stan doesn't
  // like). The following is exactly equivalent to:
  // ag_r ~ binomial(nprime, pprime);
  for (i in 1:ni_agd_arm)
    target += lchoose(nprime[i], agd_arm_r[i]) +
              lmultiply(agd_arm_r[i], pprime[i]) +
              (nprime[i] - agd_arm_r[i]) * log1m(pprime[i]);
}
generated quantities {
  vector[ni_agd_arm * n_int_thin] theta2_bar_cum;
#include /include/generated_quantities_theta_fitted.stan
#include /include/generated_quantities_common.stan
#include /include/generated_quantities_theta.stan

  // IPD log likelihood and residual deviance
  for (i in 1:ni_ipd) {
    log_lik[i] = bernoulli_lpmf(ipd_r[i] | theta_ipd[i]);
    resdev[i] = -2 * log_lik[i];
    fitted_ipd[i] = theta_ipd[i];
  }

jsocolar · July 30, 2021, 2:56pm

Couple of quick things

If batch processing 50+ of these becomes unmanageable, then it sounds like you’re already limited by the number of cores you have available. Parallelizing to use more cores won’t help; in fact it’ll slow things down.
You can check whether the loop in the model block or the loop in the generated quantities block is the bottleneck using profiling.
The for-loop over i in the model block looks like a candidate for parallelization. The efficient way to do this will depend on what of the arguments are parameters and what are data.
If the generated quantities calculation is a bottleneck and you have enough resources available to parallelize, you might consider fitting the model without the generated quantities, and then parallelizing the computation of the generated quantities outside of Stan using the model output.

Topic		Replies	Views
Parallelization in Stan General	6	501	October 24, 2020
Parallelisation suggestion Developers paralellization	11	574	April 22, 2020
STAN model is very slow when using large data (only uses one CPU!) General	3	399	September 6, 2023
Parallelization in cmdstanr: using reduce_sum() on a multinomial likelihood Modeling cmdstanr , multinomial-response	14	1177	January 9, 2024
Using reduce_sum for ODE parallelisation Modeling rstan , ode , compiler , paralellization	24	1693	April 5, 2021

RStan parallelising using reduce_sum()

Related topics