Reduce_sum() using only one thread

Peter_Adelman · June 5, 2020, 4:28pm

I have implemented a model using no within-change parallelism, map_rect(), and now reduce_sum() to profile the speedup vs. resource consumption tradeoff of each approach.

All three models run. When map_rect() runs with one chain, it uses 8 cores (800% in top). When reduce_sum() runs with one chain, it uses only 1 core (100% in top).

My make/local looks like this and I have done make clean-all and make build since making it.

CXXFLAGS += -DSTAN_THREADS
CXXFLAGS += -pthread
STAN_THREADS=true

The reduce_sum() demo (redCards) works and results with top reporting CPU% > 100.

Is this just my grainsize=1 determining that I don’t need to within-chain parallelize it with my data scale? I increased my data scale 10-fold and it still doesn’t broadcast it.

I have also noticed that reduce_sum() seems to only slice its first argument. I have 3 arguments that would benefit from slicing, one of which is a massive feature matrix. Currently it is being passed as one of the s arguments to reduce_sum() instead of as the sliced argument, because if I pass it as matrix[,] then matrix multiplication with it fails (“Ill-typed arguments for *. … matrix[,], vector”).

wds15 · June 10, 2020, 7:16am

if you get our examples to work and scale over CPUs, then things should work… maybe post your model.

you can only slice the first argument. In case you need more things to slice, then you have to pack that into the first argument accordingly. However, note that arguments which are data do not really need to be sliced as data is passed around as reference, so no copying.

Peter_Adelman · June 10, 2020, 7:33pm

Nevermind, 100% my fault. I was passing a different Int instead of grainsize.

Topic		Replies	Views
Understanding reduce_sum efficiency Modeling	10	845	March 22, 2021
Reduce_sum performance Modeling performance , paralellization	15	1342	September 28, 2020
Extending reduce_sum to use MPI Developers	5	486	June 1, 2021
Help with reduce_sum Modeling	32	1446	August 4, 2020
Reduce_sum performance Modeling	5	865	May 22, 2020

Reduce_sum() using only one thread

Related topics