Potential slowness in operands_and_partials

bbbales2 · August 27, 2018, 3:16pm

@seantalts The check_finite on X in the GLMs. Maybe could just move that check to https://github.com/stan-dev/math/blob/develop/stan/math/prim/mat/prob/normal_id_glm_lpdf.hpp#L103 or whatever, and then the checks are kinda in the same place as the normal(X * beta, sigma) syntax.

seantalts · August 27, 2018, 4:24pm

I see, that makes sense.

I did not realize that we’re doing checks on the data every leapfrog step… Seems hugely expensive to check X every time if it’s always the same! Is it possible to put if(!constant_struct<X>) guards around it and maybe check it somewhere else? Or somehow note which matrices have already been checked…?

wds15 · August 28, 2018, 7:44am

This checking stuff has been discussed a few times in the past and the conclusion (as I recall) was that

the price is not huge (at least @syclik commented that it never affected performence for him as I understood).
for big checks which we always expect to be true or false we can use the likely and unlikley macro which helps branch prediction (not sure if that is still the case with Spectre&Meltdown bugs around)
but the biggest thing is: we have to do these checks … and by design of Stan (dynamic AD tree) we likely have to do these checks again and again.

but if you can show that the performance is really degraded in some examples, we should probably think about how to do these a single time if that is adequate.

seantalts · August 28, 2018, 7:59am

In the benchmark by @tadej above, they take about 70% (60% listed, but the method under test itself only took 85%, so 60/85) of the total time of a call to normal_id_glm_lpdf with large-ish matrices:

So I think they could be quite expensive. The most natural mechanisms for managing this would be resolved at compile time (e.g. associating functions with the required checks and having the compiler generate them just once if they’re operating on data that doesn’t change). Checks for parameters probably do need to happen every time (except perhaps check_positive) on something we are guaranteeing to be positive).

If we limit ourselves to what we can achieve with just the math library, I think we might need to modify the data being passed in to mark each collection with metadata such as whether it has been modified since it was last checked to be e.g. finite.

seantalts · August 28, 2018, 10:14am

Started a new thread on Math library input checks like this: Math library input checks

sakrejda · August 28, 2018, 10:25am

IIRC, Dan’s comments were based on manually removing checks and doing comparisons that way which gave a different answer than profiling.

syclik · August 28, 2018, 10:46am

Yes, that’s right. For simple checks like check finite, I haven’t found any performance difference when completely knocking it out of the code. I realize the profiler says it’s high, but I’ve never seen it actually show up under normal conditions.

I believe there are some checks that will be expensive that we should rethink.

Mind doing the timing without debug?

seantalts · August 28, 2018, 11:07am

I’m responding in the new thread about Math library input checks (link above).

Bob_Carpenter · September 10, 2018, 8:12am

Are you sure that’s real? When we’ve done real tests (end to end without profilers) and just turned off the tests, there hasn’t been a large overhead for validation tests.

Now the copying is another matter—that’s a real overhead in memory and iterating over lots of data. Hopefully it’s at least done memory locally (iterating over size in one loop, not over columns then rows or even worse, rows then columns).

That should be fine as it corresponds to our intended usage.

Something like that, but it’s not technically a move as that involves straight-up assignment and rules for &&.

seantalts · September 10, 2018, 2:20pm

Yep, it checks out for large matrices (e.g. with a 5000x5000 data matrix, check_finite is at least half of the execution time). Here is another thread where we showed that in the way you described: Math library input checks - #41 by seantalts

As of now, I haven’t seen operands_and_partials showing up in profiling anymore (after the bug fix). Does anyone have a new example?

Bob_Carpenter · September 14, 2018, 2:53pm

Nice! Thanks for fixing. This one’s in the inner loop of almost all of our programs, so we really need to get 2.19 rolled out when GPUs are ready for their public debut.

seantalts · September 14, 2018, 3:22pm

It only shows up with large data matrices as far as I can tell, but yeah :)

Topic		Replies	Views
Use of operands_and_partials slows down things? Developers	11	833	February 4, 2018
Operands and partials with more than five edges Developers math	4	548	August 8, 2018
Performance Graphs for New GLM Primitives Developers	15	2140	September 10, 2018
Operands and Partials: partials_ vs partials_vec_ Developers	10	1040	February 3, 2018
Stan SIMD & Performance Algorithms	23	4480	January 23, 2020

Potential slowness in operands_and_partials

Related topics