PETSc in Stan Math

This is an initial investigation of using PETSc + MPI instead of Eigen as the backend of Stan Math. The repo with a toy example can be found in

The example is for vector dot_product. A PETSc version dot_product_petsc is used to compare with Eigen version, on a 8-core run.

mpiexec -n 8 test/unit/math/prim/mat/fun/dot_product_petsc_test -vec_type mpi

The runtime of Eigen‘s dot method vs PETSc’ VecDot on my OSX:

elapsed time(dot_product_eigen): 0.033287
elapsed time(dot_product_petsc): 0.005095

Note this is sequential Eigen vs PETSc so naturally there is a speedup.

As to setup, one needs to install MPI and PETSc on his own, as well as set PETSC_DIR and PETSC_ARCH in bash. make/tests, make/petsc in the repo are tested on OSX, so that the above test problem can be built with

make test/unit/math/prim/mat/fun/dot_product_petsc_test

Awesome! Want to give us some context on PETSc? (links, references, what motivated this experiment?)

Some additional questions:

  • Is there enough breadth in PETSc to do enough of what we have going on?
  • Can we have some things in PETSc and some things in Eigen?
  • What happens with one core?

Petsc “is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations”. Essentially it provides an HPC infrastructure for numerical solvers on linear system solution, differential equations, and optimization. The motivation is to use it for some critical and large-scale numerical operations involving MPI/CUDA. One example of such operation is Cholesky factorization of a large matrix. One can also take advantage of the interfaces it provides on third-party numerical solvers. When comes to MPI, It relieves developers from implementing manually basic operations such as scattering/gathering a distributed matrix.


Yes. That’s my plan.

Sequential results (vector size = 800000)
elapsed time(dot_product_eigen): 0.005929
elapsed time(dot_product_petsc): 0.005934