That won’t work with Stan—we don’t have a ragged vector type yet.
That’ll cover our current functions. It won’t allow lambdas, though, which produce what is essentially a functor.
Great. I must have misunderstood what I was reading.
That sounds good to me.
It’s great you coded it that way.
OK, I think that answers my question above, which I’ll pull down here just for confirmation.
By “works exactly the same way”, does it depend on MPI and just run multiple processes on one core? Or do you just mean that it reduces the autodiff tree by doing nested autodiff and producing smaller expression graphs overall? If it’s the latter, that’s fantastic. Even if it’s the former, that’s good to know and would be good to know how it scales. 70% faster code is fantastic and it’d be nice to exploit this trick elsewhere.