I’m playing with using the the map_rect for a sum marginalization where there are no real data in the argument. This is effectively a sum over a bunch of latent parameters that I can easily break into shards and add the computed density onto the target at the end.
The results of the model are the same, but the model implementing map_rect with mutliple threads is much slower. I can post an example, but I’m curious if this behavior is expected when the operation is only on parameters. I’ve run all the toy models and seen the impressive speed up, so I’m guessing that this is just a model specific issue.
No, this behavior is not per se expected. It really depends on how expensive each task is. It’s hard to say anything without seeing more detail. One thing which you always should make sure is to only return a single log-lik contribution from each job to cut down on the communication.