This is an initial investigation of using PETSc + MPI instead of Eigen as the backend of Stan Math. The repo with a toy example can be found in
The example is for vector
dot_product. A PETSc version
dot_product_petsc is used to compare with
Eigen version, on a 8-core run.
mpiexec -n 8 test/unit/math/prim/mat/fun/dot_product_petsc_test -vec_type mpi
The runtime of
dot method vs PETSc’
VecDot on my OSX:
elapsed time(dot_product_eigen): 0.033287
elapsed time(dot_product_petsc): 0.005095
Note this is sequential
Eigen vs PETSc so naturally there is a speedup.
As to setup, one needs to install MPI and PETSc on his own, as well as set
PETSC_ARCH in bash.
make/petsc in the repo are tested on OSX, so that the above test problem can be built with
Awesome! Want to give us some context on PETSc? (links, references, what motivated this experiment?)
Some additional questions:
- Is there enough breadth in PETSc to do enough of what we have going on?
- Can we have some things in PETSc and some things in Eigen?
- What happens with one core?
Petsc “is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations”. Essentially it provides an HPC infrastructure for numerical solvers on linear system solution, differential equations, and optimization. The motivation is to use it for some critical and large-scale numerical operations involving MPI/CUDA. One example of such operation is Cholesky factorization of a large matrix. One can also take advantage of the interfaces it provides on third-party numerical solvers. When comes to MPI, It relieves developers from implementing manually basic operations such as scattering/gathering a distributed matrix.
Yes. That’s my plan.
Sequential results (vector size = 800000)
elapsed time(dot_product_eigen): 0.005929
elapsed time(dot_product_petsc): 0.005934