Inverse Speedups on GPU

I was going to answer one of the other threads but I went out with coworkers for drinks tonight and I decided instead to post a little thing about a solid win that I think went under the radar

The inverse calculation on the GPU is wildly fast!

@rok_cesnovar posted this on the inverse PR, but I thought it would be nice to repost here.

These are the speedups that were measured using a separate tests scripts for the gpu -> link and cpu version -> link.

For the CPU I ran the tests on a i7-4790 CPU @ 3.60GHz

Speedup for the Titan XP

So for a titan XP, a desktop GPU, we top out at about 45x relative to the CPU version, pretty nice!

For the V100, a more hardcore / scientific GPU, we get the below

So even at 10K size matrices it looks like this still had a lot of power to churn!


Will those speedups also hold for matrix division or log determinants? We tend not to use a lot of pure inverse calculations.

These speedups look great, by the way, which is why I’m excited about them generalizing!

1 Like

Yes, these speedups also hold for mdivide_left_tri, mdivide_right_tri, etc.

1 Like

That’s fantastic.