Math library input checks

I think this question might be mostly for @Bob_Carpenter and @mitzimorris, but it seemed like a good thing for public record.

I know that we can’t reassign to things that are declared data in the Stan language. Is that information propagated down to Math in any form? It would be extremely useful for performance optimizations, such as not checking that a 200MB matrix is finite every leapfrog step (check_finite on data matrices takes about 70% of execution time in normal_id_glm_lpdf for example).

I had in mind a specialization for finite (agreed name should be changed at some point) that would cache which data matrices it had already checked. This code assumes that all Eigen::MatrixXds are immutable - are there places that Stan generates code where this isn’t true? If not, then we might have our first time needing to write code that makes the Math library not really suitable for general-purpose use. Or else rewriting how we do checks substantially, or overriding functions like check_finite in the Stan language and hiding the Math versions.

Specialization:


template <int R, int C>
struct finite<Eigen::Matrix<double, R, C>, true> {
  static void check(const char* function, const char* name,
                    const Eigen::Matrix<double, R, C>& y) {
    // TODO: Check if or make thread-safe
    static std::set<const double *> cache;
    if (cache.find(y.data()) == cache.end()) {
      if (!value_of(y).allFinite()) {
        for (int n = 0; n < y.size(); ++n) {
          if (!(boost::math::isfinite)(y(n)))
            domain_error_vec(function, name, y, n, "is ",
                             ", but must be finite!");
        }
      }
      cache.insert(y.data());
    }
  }
};

This reduces the normal_id_glm_lpdf benchmark (see inspirational thread) time to about 25% of the time it takes without the specialization, a 4x speedup. Currently, math library tests for is_finite are written to create a single matrix and then change its contents and continue passing them, so this fails those tests. But I think those tests may not reflect generated Stan code.

3 Likes

right now in the Stan langauge, qualifier data is only on function argument declarations.
the compiler checks that the expression passed in to the function call doesn’t contain any parameter variables. as you can’t assign to function arguments in a function, you can’t assign to data qualified function arguments, nor can you assign to variables declared in data blocks.

it should be possible to put more smarts into the compiler to flag constant variables.

@syclik My benchmark numbers are a bit off - debug was off but somehow I wasn’t getting the default O=3 flag. With that flag on and a 20000x5000 matrix and 10 repetitions[0][1], now I’m seeing a speedup of 27% (down from 38) with this current PR and 33% speedup for the specialization above compared with the PR.

[0] in a real model execution there would be tens of thousands and savings from doing the check once scale very well with iterations
[1]here is the benchmark code, it might also deserve inspection as it’s handwritten

FYI, O=3 without debug flags behaves really differently from with it on.

But if those are the numbers you’re seeing, that’s great! Mind double-checking with the debug symbols not compiled in? (I would do it, but I don’t know where the test is exactly.)

To clarify, debug was off.

Sweet!

Talking with @syclik offline he mentioned that, while it is true that all MatrixXd's are for data variables (!), you could also have some that are computed based on parameters within a Stan model. If they are saved to a local variable, they become a Matrix of vars. But if they are used directly as an expression, it seems that they will be converted to a MatrixXd (see e.g. multiply, which is what vector[] * 4 compiles to).

So my take on this caching specialization above is that we can’t rely on it to check expressions like that correctly, because we might not get the same memory address for them. Boo.

A hacky alternative might be asking stanc to register the pointers of all data matrices with a global static data structure in the Math library.

If we wanted to have heavier objects, we could carry around flags that say that it’s been checked. Or for something like a cholesky factor, we could create a different type.

That would be nicer (esp for multi-threading) and less hacky (pointers are not typically a valid identifier, though they work here in this very limited context as they are allocated in a model ctor once). The bad thing there is that we’d need to rewrite almost every matrix function in the math library to deal with these types, no? Unless maybe we made another type satisfying a numeric concept (ie providing operator* et al) and made matrices of that, and specialized on that when appropriate.

I was really hoping that we could just specialize on Matrix<const double, -1, -1> which would also have been a reasonable expression of our intention in the type system, but Eigen’s template magic makes that not work for reasons I don’t totally understand.

Absolutely agree.

Yup. Doing it that way would require that. Although… there’s a super-hacky way to do it without it. (super duper hacky)

There’s a way to arbitrarily extend Eigen. See the Eigen doc. One way I could see us doing this:

  • add a bunch of bool flags that are false to start, e.g. is_positive_definite, is_square. False would imply that it hasn’t been checked, not that it’s actually not that thing. If it failed, we’d assume that we’d kick out of whatever function.
  • each of the check_* functions that operate on Eigen::Matrix would set the appropriate flag to true if it passes. If the relevant flag is already set, no checking necessary!
  • if the check_* function fails, we’d leave the flag alone and do what we already do.
  • we would need to change all of our functions that accept Eigen::Matrix so that it’s not const any more. All of those functions would have the ability to mutate the input matrix.

That’s one way to do it. I don’t know what the performance benefit / hit would be. Naively, I think that would destroy our future ability to use threading, but maybe not. I don’t know if the compilers also use the const qualifier to cleverly optimize in ways I can’t imagine. Or if carrying around a heavier Eigen::Matrix object tips past some memory boundary and destroys performance everywhere.

I don’t understand your terminology… I don’t think you mean template specialization?

But… just having an Eigen::Matrix<const double, -1, -1> (where the relevant type is const double as opposed to double) still doesn’t buy us much. We need to inspect that matrix to know if there are NaNs, infs, if it’s positive definite, etc.

This is pretty dope. I think what you’re saying would work, though I am sad to lose the const and wonder if we could find a way without losing it. I also think we’d still need to do this only for specializations on Matrix<double, R, C> because it wouldn’t work on parameters without changes to the Stan compiler.

Without losing the const, we could instead add a field or method to the Eigen base with the plugin system you linked to that identified a Matrix as immutable data or not. Then, for matrices we know are data, we use the same specialization above to cache the checked boolean in a static local variable. This approach with a parallel data structure is slightly attractive mostly because it lets us keep the const notion on the Eigen types. I’m not really sure how I feel about these options but wanted to lay out the idea here.

We talked about this IRL but just to clarify, if we had stanc create all data matrices as Matrix<const double, -1, -1> then we could know that that was a marker that only applied to immutable data and provide specializations for that that differ from the ones for MatrixXd and thus use the trick above to cache the pointer address.

One way to avoid losing const is by using mutable.

1 Like

Will have to look into mutable

Relevant small side question: would we rather our matrix library took in MatrixBase instead of Matrix pretty much everywhere? https://eigen.tuxfamily.org/dox/classEigen_1_1MatrixBase.html

Unfortunately, we can’t. There were a couple of threads on the old dev list and possibly here too. With concrete code and simple examples where by using MatrixBase some functions don’t compute the correct thing. It doesn’t compute the right thing!!!

Meaning it’d compile and run, but the output from a function call would just be wrong. We didn’t know under what conditions it was wrong, but we had found a couple and that was enough for us to not rewrite the whole library. (We could still do it, we’d just need to overload both Matrix and MatrixBase every time it doesn’t work.)

That is extremely surprising, given that it seems to be what Eigen recommends… Should probably send an upstream bug report.

Here’s the thread / example: https://groups.google.com/forum/#!topic/stan-dev/ZKYCQ3Y7eY0

My recollection might have been off and it runs, but crashes with asserts? It was years ago.

1 Like

I think this might be the best way forward - we can just add our custom Eigen Base plugin and declare the new fields. Here’s the beginning of it on a branch, with 7 fields (I think the first 8 come for the price of one, because the bools should be packed into the struct right?) and check_finite using it. I changed and added a new test - this changes semantics kind of significantly for users / developers of the Math library, and in a way that doesn’t happen if we also annotate matrices with an immutable flag added by stanc and check for that too. idk.

WIP PR: https://github.com/stan-dev/math/pull/1014

Also maybe worth noting - we can totally solve this at the compiler level with 0 runtime overhead, but I think it would probably wait for a new stanc.

Hi guys (@seantalts, @tadej, @Erik_Strumbelj, in particular),
I was really confused by the error checking as it showed up in my profiling as well, as being the bottle neck. I’ve now done some end to end benchmarks on GLMs with and without error checking and basically, there is no speed difference at all. See attached (ugly) figures.
It really seems like the error checking showing up in the profiling is a red herring after all. What do you think?

Rplots - 50 data.pdf (8.0 KB)
Rplots - 50 param.pdf (8.1 KB)
Rplots - 5000 data.pdf (8.2 KB)
Rplots - 50000 data.pdf (8.1 KB)
Rplots - 500 data.pdf (8.2 KB)

What is the model you are checking? I’m using @tadej’s script here: https://gist.github.com/seantalts/e8abe1c002a65dc6f6c5f029759ba35e

Compiling it like this:

clang++ -Wall -I . -isystem lib/eigen_3.3.3 -isystem lib/boost_1.66.0 -isystem lib/sundials_3.1.0/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -O3 -Wno-unknown-warning-option -Wno-tautological-compare -Wsign-compare -isystem lib/opencl_1.2.8 -DSTAN_OPENCL -DOPENCL_DEVICE_ID=0 -DOPENCL_PLATFORM_ID=0 -DNO_FPRINTF_OUTPUT -pipe -framework OpenCL  perftest.cpp   -o perftest