@seantalts The check_finite on X in the GLMs. Maybe could just move that check to https://github.com/stan-dev/math/blob/develop/stan/math/prim/mat/prob/normal_id_glm_lpdf.hpp#L103 or whatever, and then the checks are kinda in the same place as the normal(X * beta, sigma)
syntax.
I see, that makes sense.
I did not realize that weāre doing checks on the data every leapfrog stepā¦ Seems hugely expensive to check X every time if itās always the same! Is it possible to put if(!constant_struct<X>)
guards around it and maybe check it somewhere else? Or somehow note which matrices have already been checkedā¦?
This checking stuff has been discussed a few times in the past and the conclusion (as I recall) was that
- the price is not huge (at least @syclik commented that it never affected performence for him as I understood).
- for big checks which we always expect to be true or false we can use the likely and unlikley macro which helps branch prediction (not sure if that is still the case with Spectre&Meltdown bugs around)
- but the biggest thing is: we have to do these checks ā¦ and by design of Stan (dynamic AD tree) we likely have to do these checks again and again.
but if you can show that the performance is really degraded in some examples, we should probably think about how to do these a single time if that is adequate.
In the benchmark by @tadej above, they take about 70% (60% listed, but the method under test itself only took 85%, so 60/85) of the total time of a call to normal_id_glm_lpdf
with large-ish matrices:
So I think they could be quite expensive. The most natural mechanisms for managing this would be resolved at compile time (e.g. associating functions with the required checks and having the compiler generate them just once if theyāre operating on data that doesnāt change). Checks for parameters probably do need to happen every time (except perhaps check_positive
) on something we are guaranteeing to be positive).
If we limit ourselves to what we can achieve with just the math library, I think we might need to modify the data being passed in to mark each collection with metadata such as whether it has been modified since it was last checked to be e.g. finite.
Started a new thread on Math library input checks like this: Math library input checks
IIRC, Danās comments were based on manually removing checks and doing comparisons that way which gave a different answer than profiling.
Yes, thatās right. For simple checks like check finite, I havenāt found any performance difference when completely knocking it out of the code. I realize the profiler says itās high, but Iāve never seen it actually show up under normal conditions.
I believe there are some checks that will be expensive that we should rethink.
Mind doing the timing without debug?
Iām responding in the new thread about Math library input checks (link above).
Are you sure thatās real? When weāve done real tests (end to end without profilers) and just turned off the tests, there hasnāt been a large overhead for validation tests.
Now the copying is another matterāthatās a real overhead in memory and iterating over lots of data. Hopefully itās at least done memory locally (iterating over size in one loop, not over columns then rows or even worse, rows then columns).
That should be fine as it corresponds to our intended usage.
Something like that, but itās not technically a move as that involves straight-up assignment and rules for &&
.
Yep, it checks out for large matrices (e.g. with a 5000x5000 data matrix, check_finite
is at least half of the execution time). Here is another thread where we showed that in the way you described: Math library input checks - #41 by seantalts
As of now, I havenāt seen operands_and_partials showing up in profiling anymore (after the bug fix). Does anyone have a new example?
Nice! Thanks for fixing. This oneās in the inner loop of almost all of our programs, so we really need to get 2.19 rolled out when GPUs are ready for their public debut.
It only shows up with large data matrices as far as I can tell, but yeah :)