Expression templates?

Oh boy! I am getting a close to 2x speed improvement when avoiding redundant lgamma calculations as these are anyway always the same. This is for the example as in Parallelization of large vectorized expressions … So this type of optimization will buy us a lot in cases when we do redundant computations as some terms are scalars and we ignore that.

So we go from 1650ms in that example to only 920ms execution time. That’s a lot and I am sure that this type of optimization can be applied a lot. This POC is also in the stan-math branch parallel-lpdf.

What do others think?

Best,
Sebastian

2 Likes