Potential slowness in operands_and_partials

Thanks! I will take a look at this and see if my branch changes things at all. I’d also be curious about benchmarks across a variety of data sizes - these 4x3 matrices are probably not indicative of the types of data where we have performance issues [edit: oops, I see there are some with 20000 x 1000 as well]