Within-Chain Parallelization Super Slow

stanuser · February 14, 2019, 7:14pm

Operating System: MacOS
Interface Version: 2.18.1

I’m trying to enjoy the benefits of the within-chain parallelization and have tried the example map_rect function in the user guide:

functions {
    vector lr(vector beta, vector theta, real[] x, int[] y) {
        real lp = bernoulli_logit_lpmf(y | beta[1] + to_vector(x) * beta[2]);
        return [lp]';
    }
} 
data {
    int y[12];
    real x[12]; 
}
transformed data {
    // K = 3 shards
   int ys[3, 4] = { y[1:4], y[5:8], y[9:12] };
   real xs[3, 4] = { x[1:4], x[5:8], x[9:12] };
   vector[0] theta[3];
}
parameters {
    vector[2] beta;
}
model {
      beta ~ std_normal();
      target += sum(map_rect(lr, beta, theta, xs, ys));
}

Using number of samples = 10,0000 and number of warmup = 2000 these are the sampling times reported:

STAN_NUM_THREADS = 1
1.21 seconds

STAN_NUM_THREADS = 4
8.51 seconds

With 1 thread, the Mac’s activity monitor will show 100% CPU utilization. With 4 threads, it shows ~160% utilization.

I’ve tried this in both CommandStan and RStan with similar results. Clearly I’m doing something wrong, but I’m not sure what.

jjramsey · February 14, 2019, 7:26pm

I don’t think the problem is at your end: Map_rect tutorial (in my system) uses 4x cores with loss in performances

stanuser · February 14, 2019, 8:51pm

Thanks @jjramsey Same issue with 2.18.0.

Edit: Reason for slowdown given in the link provided.

Topic		Replies	Views
Map_rect causing substantial slowdown; trying to understand how to fix Modeling cmdstanr , paralellization	13	875	July 24, 2020
Map_rect spawns too many threads than requested Modeling rstan , performance	13	805	January 25, 2021
Measure HMC Time Usage General	3	467	October 23, 2019
Catastrophic performance drop with map_rect Modeling paralellization	4	536	May 9, 2023
Map_rect, multithreaded on parameters only Algorithms	2	690	November 12, 2018

Within-Chain Parallelization Super Slow

Related topics