Multiple calls to map_rect

I have a sort of general question that might be a bit dumb. I am trying to improve the efficiency of a relatively complicated model that can be best described as a MIMIC model that is also jointly estimated with an multinomial logistic model.

I am trying to improve the speed it estimates at by using map_rect but putting a lot of it into the map_rect function requires a lot of manipulation of the datasets (packing them and then unpacking them). I am curious if it is feasible/wise to call map_rect multiple times within a model of it that is a bad idea.

Thanks for any help. I can provide some models if necessary.

That’s fine.

1 Like

I would like to expand on this question (@wds15) .

What about the performances of doing (?)

map_rect({
do A;
do B;
})

versus

map_rect({
do A;
})

map_rect({
do B;
})

is it equivalent, or the option (1) is much better than option (2)? Or depends on some condition?

Thanks.

1 Like

I am assuming that A and B have many sub-tasks… then its best is to run a single map_rect, but with a random permutation of all jobs from A&B.

In all honesty, I am only giving you judgements on all of this based on my knowledge of having implemented it. So what map_rect does:

  1. split your N jobs into B blocks of equal size N/B whenever you request to use B CPUs
  2. run all B blocks in parallel as a chunk

That’s about the simplest queueing you can do. It works good if you can assume to have roughly equal work to do per chunk. A random order is essentially assumed.

The other rule of thumb is that parallelization is costly and you should reduce the number of map_rect calls since it adds overhead.

However… you can likely save your efforts here as a much improved version of this should be landing in stan at some point; though it’s a question of when - and here I am hesitant on a prediction.

1 Like

Please! Variable packaging and un-packaging is a nightmare for complex models and very error prone!