Potential slowness in operands_and_partials

@seantalts The check_finite on X in the GLMs. Maybe could just move that check to https://github.com/stan-dev/math/blob/develop/stan/math/prim/mat/prob/normal_id_glm_lpdf.hpp#L103 or whatever, and then the checks are kinda in the same place as the normal(X * beta, sigma) syntax.

I see, that makes sense.

I did not realize that weā€™re doing checks on the data every leapfrog stepā€¦ Seems hugely expensive to check X every time if itā€™s always the same! Is it possible to put if(!constant_struct<X>) guards around it and maybe check it somewhere else? Or somehow note which matrices have already been checkedā€¦?

This checking stuff has been discussed a few times in the past and the conclusion (as I recall) was that

  • the price is not huge (at least @syclik commented that it never affected performence for him as I understood).
  • for big checks which we always expect to be true or false we can use the likely and unlikley macro which helps branch prediction (not sure if that is still the case with Spectre&Meltdown bugs around)
  • but the biggest thing is: we have to do these checks ā€¦ and by design of Stan (dynamic AD tree) we likely have to do these checks again and again.

but if you can show that the performance is really degraded in some examples, we should probably think about how to do these a single time if that is adequate.

1 Like

In the benchmark by @tadej above, they take about 70% (60% listed, but the method under test itself only took 85%, so 60/85) of the total time of a call to normal_id_glm_lpdf with large-ish matrices:
image

So I think they could be quite expensive. The most natural mechanisms for managing this would be resolved at compile time (e.g. associating functions with the required checks and having the compiler generate them just once if theyā€™re operating on data that doesnā€™t change). Checks for parameters probably do need to happen every time (except perhaps check_positive) on something we are guaranteeing to be positive).

If we limit ourselves to what we can achieve with just the math library, I think we might need to modify the data being passed in to mark each collection with metadata such as whether it has been modified since it was last checked to be e.g. finite.

Started a new thread on Math library input checks like this: Math library input checks

IIRC, Danā€™s comments were based on manually removing checks and doing comparisons that way which gave a different answer than profiling.

Yes, thatā€™s right. For simple checks like check finite, I havenā€™t found any performance difference when completely knocking it out of the code. I realize the profiler says itā€™s high, but Iā€™ve never seen it actually show up under normal conditions.

I believe there are some checks that will be expensive that we should rethink.

Mind doing the timing without debug?

Iā€™m responding in the new thread about Math library input checks (link above).

Are you sure thatā€™s real? When weā€™ve done real tests (end to end without profilers) and just turned off the tests, there hasnā€™t been a large overhead for validation tests.

Now the copying is another matterā€”thatā€™s a real overhead in memory and iterating over lots of data. Hopefully itā€™s at least done memory locally (iterating over size in one loop, not over columns then rows or even worse, rows then columns).

That should be fine as it corresponds to our intended usage.

Something like that, but itā€™s not technically a move as that involves straight-up assignment and rules for &&.

Yep, it checks out for large matrices (e.g. with a 5000x5000 data matrix, check_finite is at least half of the execution time). Here is another thread where we showed that in the way you described: Math library input checks - #41 by seantalts

As of now, I havenā€™t seen operands_and_partials showing up in profiling anymore (after the bug fix). Does anyone have a new example?

Nice! Thanks for fixing. This oneā€™s in the inner loop of almost all of our programs, so we really need to get 2.19 rolled out when GPUs are ready for their public debut.

1 Like

It only shows up with large data matrices as far as I can tell, but yeah :)