Speeding up testing

Hi!

… maybe this has been proposed already: Would it be an option to run most/all of our unit tests without optimizations? I mean, we should set O=0 (or at most O=1) for the compilers. Especially gcc is supposed to speed up by doing so.

I haven’t tried the benefits myself yet… and I would agree that it would be good to do at least some testing with O=3 which is what we recommend to users; but maybe the O=3 testing can be deferred to the more extensive testing phase done when merging a given PR to develop, but not on a given PR?

Another possibility for speeding things up is to pre-build sundials and boost MPI (sure, the jenkins config burden may prevent that… I am not volunteering on this one…).

Best,
Sebastian

1 Like

I don’t know that we need to test at -O3—it’s just not clear where to stop. If we test -O3 now, why not -O0, too, as we’re not saying that Stan only runs with -O3.

I always run tests locally at -00 levels. I don’t know about the servers. For a lot of tests, it’s a tradeoff of compile time vs. run time, as the -O0 versions are more than ten times slower.

The fastest thing would be to cache results. After the makefiles were fixed, the Math library will now only recompile tests only if something it depends on has changed. In order to really take advantage of that, we need to go back to including what we use for headers in the tests; I think there’s a way to do that sanely, but it’s quite a bit of work at this point. Even without doing that, we should be able to reduce the testing for most PRs just by leaving the built tests around.t

Most of the distribution tests wouldn’t have to be recompiled in most PRs. Or if it does, it’ll just be the ones that matter. I think that’d be the quickest improvement we can make to the testing time (at the cost of disk space).

1 Like

I think we’d get a lot of speedup bang for buck switching to two “universal build” headers as I think you called them at rev/mat and mix/mat and collating tests to be at one of those two levels. Switching to IWYU at this point would be not just a pain in the butt but importantly it would also fail to test our supported API.

Unfortunately I think we probably do need to test at O3 since that’s what we expect users to use. I also would expect collating tests would buy us more than switching optimization level.

If you’re interested, they’re called “unity builds” and they’re really about sticking everything into a single translation unit. Here’s one reference: A guide to unity builds - Not just gamedev
There’s a lot of threads on stack overflow on it. There are some build systems that try to automate it.

I think we’ll see enough of an improvement once we squash the scal / arr / mat folders. At this point, I wouldn’t suggest this, but let’s evaluate after that’s done.

It actually wouldn’t be that much of a pain! I could probably take care of it in a weekend. I think @roulades might be able to just script out of it. Essentially, all we’d need to do is include the traits, then one header for the header under test.

Re: not testing the supported API. I think it’s ok for unit tests. We should have tests that use the main header and we do. But for unit tests, there’s a trade-off between good practice + developers time and full confidence that everything works under the supported API. Here, if we use more selective includes, we could break down the unit testing to only rebuilding tests that actually include the header that changed. For something like a new function, only the new test would be build. For something like a trait, everything would be rebuilt. I’m good with that for unit tests.

Agreed. I don’t really want to hunt down optimization-level bugs. We’ve run into them before. They’re hard to track down.

Ah! Yeah I’m familiar with these. I thought there was some other name for just having an “includes the world” header - unity builds would maybe imply that we include all of our tests in one file as well and compile them all in one go. Is that what you had in mind? That would make getting code coverage reports easier as well.

Famous last words :P

If we went to full IWYU, which tests would then use the main header?

Right now, we have one quick integration test that has code from the paper. I’m sure we have a few more in the math library and we could add some more.

Everything in Stan that uses the Math library only uses the main header. Things like the performance tests are end-to-end tests.

Yeah, so currently there are almost none of these tests for the main headers :P There was that logistic regression performance test and we’re adding some benchmarks to Math PRs but not with anything close to reasonable code coverage of the Math library. Just something to think about - All that said, I would probably agree with you here, reasoning that Stan Math and Stan are so tightly integrated that you really do just want to spend the testing effort first developing end-to-end tests at the Stan or CmdStan level and make end-to-end tests at lower levels a much lower priority or forgo them entirely.

Honestly I think that world would be better - fast, incremental unit tests for daily life and a suite of slow end-to-end tests with less code coverage run less frequently. I think I had a little reluctance towards that idea but I think it’s because the Math header system is so complicated that it does feel like you need to test it super carefully, but if we go to only 1 or 2 end-of-the-world headers then I would feel a lot more comfortable.