Parallel autodiff v3

Ok… will run it after I change the nesting stuff as explained in the PR.

EDIT: Looks a bit odd for 2 cores, but other than that its cool:

speedup grainsize= 0

walltime grainsize = 0

speedup grainsize bigger

walltime grainsize bigger

Cool stuff!

I hope this is heading toward a design document in our design-docs repo. Once there’s a concrete, self-contained proposal, then we can comment on the PR.