Parallelization of large vectorized expressions

wds15 · July 29, 2018, 2:52pm

Yup. I think we should implement a global thread pool to gain some more control over this.

I think this is what I suggesting here, no? I am suggesting same # of threads = same result. That gives us already a lot of freedom and people would get exactly the same numbers when running with exactly the same number of threads. That would be fine for me.

I have completed a small POC for this:

10^7 terms
Poisson lpmf
lambda parameter is a var

What I am doing is to compute the lpdf and it’s gradient, not more.

Note that the 8 core run is using hyperthreading (my MacBook has 4 cores). See the attached results.

Is that convincing to continue? Thoughts?

The code is on the stan-math branch parallel-lpdf in case you want to look (this is really only a POC, not more).

Hopefully I find the time to apply this to a real problem to see how it performs there.

Best,
Sebastian

lpmf-multicore.pdf (5.6 KB)

Topic		Replies	Views
Proposed parallelism RFC - Stan language bits Developers	14	1070	July 9, 2019
Vectorising user-defined multivariate lpdf/lpmf functions Modeling performance , loo , paralellization	13	1415	November 7, 2020
RStan (PyStan) & MPI / GPU Developers features	43	3501	September 24, 2017
Parallelization (again) - MPI to the rescue! Developers	52	5223	June 27, 2017
Linear, parallell regression CmdStan	89	5511	September 20, 2018

Parallelization of large vectorized expressions

Related topics