Old end-to-end model tests

syclik · March 22, 2018, 3:06pm

@seantalts, this is for you.

A long time ago, we had run all the BUGS examples to test for a few things:

test for compilation (we don’t need this now)
check that we didn’t break things between commits (we found a lot of things by checking end-to-end that we missed with unit tests)
get an overall sense of speed (we ran this on Jenkins and just tracked the overall runtime; little slips in gradients that passed tests would often get caught here)

We had removed them due to too many false positives and @betanalpha had a framework for testing the correctness in a different way. They still provided some value since we don’t really have end-to-end tests now.

Here’s an example test:

github.com

stan-dev/stan/blob/v2.1.0/src/test/models/bugs_examples/vol1/blocker/blocker_test.cpp

#include <gtest/gtest.h>
#include <test/models/model_test_fixture.hpp>

class Models_BugsExamples_Vol1_Blocker : 
  public Model_Test_Fixture<Models_BugsExamples_Vol1_Blocker> {
protected:
  virtual void SetUp() {
  }
public:
  static std::vector<std::string> get_model_path() {
    std::vector<std::string> model_path;
    model_path.push_back("models");
    model_path.push_back("bugs_examples");
    model_path.push_back("vol1");
    model_path.push_back("blocker");
    model_path.push_back("blocker");
    return model_path;
  }

  static bool has_data() {

This file has been truncated. show original

Maybe we should start testing something like this and tracking performance?

seantalts · March 22, 2018, 3:25pm

Cool! I have my Python script building and running these models on their data and recording summarized output in a diffable format.

My branch with runPerformanceTests.py is here: https://github.com/stan-dev/cmdstan/tree/perf

In this branch, I added the example-models repo as a submodule, do some slight trickery to try to find model / data file pairs, then compile and run them, recording the output to tests/golds (which I was hoping to check in) and writing a timing file to times.csv (not checked in, but tested always by Jenkins on a specific machine).

I’m not sure how useful the golds will be if we can’t figure out any way to get reproducibility at least for like, clang + OS X or some reasonable pair.

seantalts · March 22, 2018, 3:26pm

PS What’s the other framework @betanalpha had? Curious about how to do any kind of portable regression testing here…

syclik · March 22, 2018, 3:40pm

https://github.com/stan-dev/stat_comp_benchmarks

Topic		Replies	Views
Template for example-models? Developers maintenance	12	1349	November 19, 2016
Link to example models missing from Stan website? General	1	565	February 1, 2022
Call for testers of model checking/testing tutorial Events simulation-based-calibration	0	921	August 16, 2021
Running stan-dev/stan tests Developers	8	581	September 11, 2019
Running single model integration test Developers	6	570	September 13, 2018

Old end-to-end model tests

Related topics