Type of Document Dissertation Author Faraj, Ahmad Author's Email Address email@example.com URN etd-07072006-162046 Title Automatic Empirical Techniques for Developing Efficient MPI Collective Communication Routines Degree Doctor of Philosophy Department Computer Science, Department of Advisory Committee
Advisor Name Title Xin Yuan Committee Chair Ashok Srinivasan Committee Member David Whalley Committee Member Guosheng Liu Committee Member Kyle Gallivan Committee Member Keywords
- High Performance
- Empirical Techniques
- Collective Routines
Date of Defense 2006-07-07 Availability unrestricted AbstractDue to the wide use of collective operations in Message Passing Interface (MPI) applications, developing efficient collective communication routines is essential. Despite numerous research efforts for optimizing MPI collective operations, it is still not clear how to obtain MPI collective
routines that can achieve high performance across platforms and applications. In particular, while it may not be extremely difficult to develop an efficient communication algorithm for a given platform and a given application, including such an algorithm in an MPI library poses a significant challenge: the communication library is general-purpose and must provide efficient routines for different platforms and applications.
In this research, a new library implementation paradigm called delayed finalization of MPI collective communication routines (DF) is proposed for realizing efficient MPI collective routines across platforms and applications. The idea is to postpone the decision of which algorithm to be used for a collective operation until the platform and/or application are known. Using the DF approach, the MPI library can maintain, for each
communication operation, an extensive set of algorithms, and use an automatic algorithm selection mechanism to decide the appropriate algorithm for a given platform and a given application. Hence, a DF based library can adapt
to platforms and applications.
To verify that the DF approach is effective and practical, Ethernet switched clusters are selected as the experimental platform and two DF based MPI
libraries, STAGE-MPI and STAR-MPI, are developed and evaluated. In the development of the DF based libraries, topology-specific algorithms for all-to-all, all-gather, and broadcast operations are designed for Ethernet switched clusters. The experimental results indicate that both STAGE-MPI and STAR-MPI significantly out-perform traditional MPI libraries including LAM/MPI and MPICH in many cases, which demonstrates that the performance of MPI collective library routines can be significantly improved by (1) incorporating
platform/application specific communication algorithms in the MPI library, and (2) making the library adaptable to platforms and applications.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access faraj_dist.pdf 1.41 Mb 00:06:31 00:03:21 00:02:56 00:01:28 00:00:07