I measured the speed between the old and new and found them to be the same, for whatever that’s worth.
I actually started with a tuple-based design when I first attempted this. I think ultimately it’s probably the way to go, but I encountered a lot of issues and found that others had a lot of difficulty understanding tuples and the design based on tuples. That said, now what we hammered out the non-tuple design more thoroughly, it might be a much shorter jump to tuples and save those bytes.
As far as prioritization and return-on-investment go, I would suspect this to be fairly marginal vs. other things that need work in Stan. But it might be worth measuring and seeing how much space we could save in the best-case scenario (i.e. we could assume a tuple-based design saved 4 bytes per instantiation of
operands_and_partials for the sake of an upper-bound on memory) for some larger models and see if that buys us anything tangible in memory-land (since it didn’t give us a speed decrease to use the extra memory in the first refactor). Tuples also take longer to compile, for what it’s worth.