Logo icon

Development of algorithms for non-blocking and rarefied collective MPI operations

Developed for: Intel Inc., Nizhny Novgorod, Russia

Purpose: the development of algorithms for non-blocking and rarefied collective MPI-operations

The purpose of the project was the development of new algorithms for non-blocking and rarefied collective MPI operations.

Non-blocking and rarefied collective operations were added to the 3rd version of the MPI standard. Currently, MPI implementations with an open source code (MPICH*, Open MPI*) contain a small number of algorithms for these types of collective operations.

It was necessary to solve the following problems:

1. To research the existing set of algorithms for non-blocking and rarefied collective operations

2. To develop new algorithms and assess their performance

The algorithms have been developed for the following functions: MPI_allreduce, MPI_Bcast, MPI_reduce, MPI_Igather, MPI_scatter, MPI_Ineighbour_alltoall.

It is preferable to use the developed knomial algorithms for collective operations when the number of processes is multiple of knomial factor.

Test methods for the optimization of local exchanges for k-tree topology have been developed and investigated. Testing has shown that this modification of exchanges structure reduces the runtime of tests on the small-sized messages, but on the large-sized messages the algorithm begins to lose to the standard one.

Optimization methods for the Cartesian topology have been investigated. Optimization speeds up MPICH algorithm on the message of medium size 512-1024 bytes, the acceleration increases with the number of processes.


Client: Intel, Nizhny Novgorod, Russia
Area of use: The algorithms were implemented and tested in MPICH library as a study for subsequent internal development of Intel company (Intel MPI)
Type (platform): Linux
Technologies and algorithms in use: Source codes of MPICH, MVAPICH and OpenMPI libraries were a basis of development. Knomial algorithms from articles of the scientist Torsten and Patarasyuk's algorithm for the non-blocking IAllReduce function were used for implementation. IMB-NBC benchmark was used for performance testing.

optimization of the migration algorithm (seismic data processing)

the analysis of WRF, MILC and HPC Repast packages to run and work efficiently on the computing cluster Petastream consisting of computing nodes of Xeon Phi coprocessors