Parallel autodiff v4

It’s surprising to see any differences between map_rect and reduce_sum… the model is written such that for map_rect each function call just returns a vector of length 1 which is the log-lik.

Still, there are some minor differences if I plot it like this: