Testing MPI code


As proposed I am opening a discussion on setting up the google test framework for the MPI code. The goal is to be able to test the MPI code in action. My current way of testing requires a dedicated main function which sets up and tears down the MPI environment as I need it. Steps are:

  1. Setting up the MPI cluster (root + workers)
  2. Deactivating output collection from the workers
  3. Sending workers into listen mode such that they can recieve and execute work (rank != 0 nodes)
  4. Start the RUN_ALL_TESTS macro only on the rank = 0 root node

I don’t think this can be achieved with the fixture concept.

A good example is the test here:

To get this test running just get that branch, follow the notes in MPI_NOTES which is in the top of the branch and you also need a make/local file where I have

LDFLAGS=-L$(HOME)/local/lib -lboost_serialization -lboost_mpi

which simply makes sure that MPI stuff is used, linked in, etc. Then one can use make as usual to build the test. To execute the test one has to use mpirun to start the test. So a

mpirun -np 4 ..path...to...test

will start 4 processes.

So unless we figure out a better way, my suggestion is to do the following:

  1. introduce an additional MPI specific google test file containing the special main to be used for MPI tests
  2. MPI tests would be called ..._mpitest.cpp which the make system recognizes and then does the right thing (special main + start tests with mpirun).


1 Like

I’ll take a look in a week. I know this is for the mechanics of including MPI code into our test framework, but mind letting me know exactly what you’d be testing with this framework?

Well, I would like to see that the mpi stuff works as I want. Example tests in that branch are:

  • test/unit/math/prim/mat/functor/map_rect_mpi_test.cpp
  • test/unit/math/rev/mat/functor/map_rect_mpi_test.cpp

There are a number of things special when we are in MPI mode:

  • static data is cached
  • exceptions are handled with much greater care
  • after the first successful call all function output sizes are considered to stay the same for every future call => I want to test that an error is raised in case this is not given
  • after all I want to see that data is correctly transferred
  • special flags must be transferred correctly
  • prevent nested calls to map_rect

So a number of things make MPI tests certainly necessary. Not all of the above is tested yet, but this is what is on my mind.

Some progress here. I found a way how we can live with the current test system given we are willing to accept a few hacks. I introduced the file

which sets up a number of globals which make things work. Note that the google test doc recommends to create a dedicated main over defining globals. This is compromise 1. The second pill to swallow is that it is not possible to prevent tests to be executed on the non-root processes. Hence, I have to start every test with the line

if(rank != 0) return;

which disables the test on non-root processes (the non-root process is just used by the root process and tests are executed in the context of the root).

What remains to solve is:

  1. compiling the MPI tests still requires additional dependencies linking with boost mpi+boost serialization+MPI base libraries and definition of the STAN_HAS_MPI macro.
  2. calling the test binaries will happen without the mpirun wrapper such that only a single process will be started. This will execute the test, but no actual transfers will take place (at least the code will be run, but not in a true multi-process mode).

A solution to the above is to use the STAN_HAS_MPI macro to disable all MPI tests in case the macro is not defined. To run the MPI tests one would have to configure compiler & libraries accordingly and then just rerun the tests.

It comes down to the question when, how often + where should MPI tests be run?

Some progress: I figured how we can have tests of map_rect_mpi in our test system in a smooth way. So I am using compile time properties (based upon definition of STAN_HAS_MPI) which set up the tests slightly differently, see here.

Fo tests with MPI the mpi_test_env.hpp referred to above setups a global test environment which initializes the MPI systems. Moreover, it defines a MPI_TEST_F macro which is just an alias for TEST_F such that tests execute normally.

On the other hand, the header acts different when NO STAN_HAS_MPI is defined. In that case, no global test environment is created and the MPI_TEST_F is defined such that the test name is prefixed with DISABLED. Doing so tells googletest to not run the given test.

I think that should work OK. With all these tricks there are no special requirements to the current test system. Should MPI tests need to be run, then one has simply to configure the build system accordingly.

I’ve been out of touch on Discourse for a couple weeks and I missed that there were a bunch of other MPI threads. Great to come back and find problems followed by solutions.