Integration tests for HMC and stan math functions

Hi!

Do we have example tests available which plug any of our stan-math function directly into HMC?

If not, any advise on this would be great, i.e. where to put this? Stan repo? Anything else?

My interest is to performance profile things (ODE stuff) and here I would guess that sticking this into HMC directly is the best thing one can do. However, I suppose this approach could be useful for other purposes.

Sebastian

If you’re trying trying to profile, this is the wrong level. You should just be profiling the function from within Math.

I don’t think there’s a framework in place to do performance testing there, so we can create one.

Maybe ODE stuff is special here, but the point of putting the ODE code under the control of HMC is that I want to stress test the ODE solver by forcing it to compute solutions for whatever HMC wants to try out. An alternative way I can think of is to draw from log-normal distributions random parameter sets and throw that at the solver.

For the performance tests I intend to have something which allows comparisons between timings among branches such that I can track if I am doing something good or bad in terms of performance.

One approach could be to use the google test framework. It does report timings; what would be left to be done is to collect that output and compare between branches optimally.

Do we want performance tests being part of Stan? It may make sense to introduce them for other purposes as well.

You’re not going to be able to profile ODEs correctly if you throw it under
HMC. That’s just not how you performance test something. You need it to be
repeatable and find out edge cases. In order to find particularly bad
instances, you can run HMC and figure that out, but the performance tests
should be completely deterministic and independent of any of the algorithms.

In Stan, there is an overall “performance” test that’s triggered. It’s not
really that – it’s just a way to keep track of overall performance. Here’s
a link to one of the later graphs:
http://d1m1s1b1.stat.columbia.edu:8080/job/Stan%20-%20Performance/lastSuccessfulBuild/artifact/test/performance/performance.png

Yes, I think we should put performance testing in Stan. I just don’t know
how to do it well – especially across different versions.

I rearranged to put the important stuff up front!

For the performance tests I intend to have something which allows comparisons between timings among branches such that I can track if I am doing something good or bad in terms of performance.

Do we want performance tests being part of Stan? It may make sense to introduce them for other purposes as well.

That would be great. We’ve wanted to set something like this
up for years. It’s challenging given the nature of MCMC.

Maybe ODE stuff is special here, but the point of putting the ODE code under the control of HMC
is that I want to stress test the ODE solver by forcing it to compute solutions for whatever HMC wants to try out. An alternative way I can think of is to draw from log-normal distributions random parameter sets and throw that at the solver.

I think the latter would be better as it’s easier to
control what you’re testing. Of course, HMC is the
end-to-end test you want, so it’d be nice to have some
benchmark models you care about to test, too. But those
aren’t unit tests for ODE solvers so much. If they bring
up problems with some parameter values, those can go into
the unit tests.

Also, the ODE solver is in the math lib, so the tests
shouldn’t depend on HMC, which is in the Stan lib. If you
do want those tests, they should be tests of HMC in the
Stan lib.

One approach could be to use the google test framework. It does report timings; what would be left to be done is to collect that output and compare between branches optimally.

Python scripts are probably easier. That’s what we’re doing
now to run unit tests. Make is very limited and hard to control.

Sorry—I said pretty much the same thing before reading
your response. Glad to see we’re in synch here.

  • Bob
1 Like

Yes. But I think the point about using Python is a good one. That might be a good way to compare timing across different git hashes. Google Test isn’t really designed for that.

Hmm… I thought to run google test as it dumps its output in a XML format which can be parsed with java into web-sites. The tool needed was ant as far as I recall.

I will look into this.

The link to the performance tests apparently does not work; there is no such png file saved at the location the link points to. Got moved?