MetaMPICH
zurück zur LfBS-Homepage
UP >
MetaMPICH - Flexible Coupling of Heterogenous MPI Systems

Introduction


Concept
Results so far
Work in progress
Partners & contact
The coupling of parallel systems is an evolving  technique to solve more complex thus larger problems in High Performance Computing and to efficiently use available computing resources. The resulting system, which consists of multiple independent and heterogenous (with respect to the operating system, architecture, CPU type and internal data representation) systems is usually called a Metacomputer, although a generally accepted defnition of this term does not yet exist. The definition we use for Metacomputer is illustrated in figure 1.
Principle of a Metacomputer
Figure 1: Basic principle of a Metacomputer

To ease the utilization of any parallel system (like such a Metacomputer), which in many cases means finding the solution to a scientific problem which is described in mathematical formulas, a programming model and a corresponding interface are required. A widely used programming model for parallel systems is SIMD (Single Instruction Multiple Data which means that all CPUs process the same code on different parts of the data which makes up the problem to solve) with Message Passing (exchanging data between processes by explicitly sending and receiving specified portions of data) as communication means. The MPI (Message Passing Interface) programming interface is based upon this model, and is available on virtually every existing computing system. Next to the vendor supplied implementations of MPI,  MPICH is the most popular implementation which is freely available for a wide variety of UNIX-like systems.

UP MetaMPICH - Flexible Coupling of Heterogenous MPI Systems
Introduction

Concept


Results so far
Work in progress
Partners & contact
The environment for the metacomputer we wanted to create leads to the following requirements:
  • the Metacomputer should offer pure MPI as programming interface to avoid the need of rewriting existing code
  • the wheel shouldn't be reinvented for this project which means that we didn't want to create a MPI implementation from scratch.
  • the connection topology between the hosts in the Metacomputer should be flexible and adaptable to take into account the existing infrastructure (multiple ATM lines between two hosts)
  • primary target platforms of implementation are tthe Cray T3E and IBM SP2 family of massively parallel supercomputers which makes direkt process-to-process communication in the Metacomputer infeasible
MPICH includes the P4-communication device which offers some kind of heterogenous message passing for quite a while,  and the current release of MPICH (1.1.1) offers even more Metacomputing functionality via the GLOBUS device. However, not all of the requirements above can be met with these solutions, which led us to design and create MetaMPICH.
design of the MetaMPICH extension
Figure 2: Design of the MetaMPICH extension to MPICH

Figure 2 shows the software design of MetaMPICH, derived from the first two requirements, combined with the architecture of MPICH: dedicated router processes are used for the inter-host communication, while the intra-host communicaton is done by the native communication means on each host. If a message has to be routed to another host, it is sent to the appropriate router process via the Gateway device, which in turn uses the native communication means of the host. When the message arrives at the target host, it is sent to the target process through the tunnel device. Using this design, MetaMPICH is very independent concerning the native communication layer, but can be used on every system which offers a MPICH channel device.

An example for a possible configuration which can be realised through this design is shown in figure 3. (the last two requirements in the list above implicate just this kind of setup). 

Example of a MetaMPICH configuration
Figure 3: Example of a possible MetaMPICH configuration

UP MetaMPICH - Flexible Coupling of Heterogenous MPI Systems
Introduction
Concept

Results so far


Work in progress
Partners & contact
The development takes place in multiple phases; the first phases which will build the initial version of the planned Metacomputer is described below.
Prototype implementation
The first phase of the development consisted of the prototype implementation based upon the  Sun Solaris operating system (running on Sparc or x86 platforms). This prototype was finished in Summer 1998 and demonstrated the functionality of the concept and its implementation. We used multiple dual-processor workstations connected via standard, fast and multiple ethernet connections to develop and test the MetaMPICH library. However, it does not make much sense to use MetaMPICH on this platform to evaluate the performance of this Metacomputer due to the need of dedicated router processes on each host.
Port to Cray T3E and SP2
The prototype code had to be ported to the primary target platforms Cray T3E  and IBM SP2 which normally would have been no big issue if it was not for some  unexpected disfunctionality of these systems. Nevertheless, the T3E port has been done and was tested between two separate T3E systems and one T3E and a Sun Enterprise Server. The SP2 port is on the way; first meaningful performance numbers will be available when it is finished. The work of this phase is done by PALLAS .
Incorporation of MagPIe
To improve the performance of these collective communications in a Metacomputer which involve more than one host, the MPICH extension MagPIe was adapted to MetaMPICH. The evaluation of the associated performance gain can be done as with the availability of the SP2 port.
UP MetaMPICH - Flexible Coupling of Heterogenous MPI Systems
Introduction
Concept
Results so far

Work in progress


Partners & contact
MetaPerf
MetaPerf is a performance monitor for the communication between the hosts of a Metacomputer. The MetaMPICH router processes can be configured to act as servers which deliver relevant performance data about the messages sent and received to the MetaPerf client which analyzes and displays this data.
MetaEdit
The definition of the topology of a Metacomputer is specified in a configuration file which can become quite complex and hard to maintain manually. MetaEdit is a Java application that simplifies the creation and maintainance of these configuration files by supplying a graphical user interface and an assistant which leads the user through the entire process.

Next to these two sub-projects, the optimization of the routing itself and the extension of the possible inter-host topologies is a constant process.

As there now exists a MPICH version for SCI-connected clusters (SCI-MPICH), we think of SCI support in MetaMPICH in the future.

Other efforts concern the inter-host connections. We plan to use ATM adapters for the router-router connections.

UP MetaMPICH - Flexible Coupling of Heterogenous MPI Systems
Introduction
Concept
Results so far
Work in progress

Partners & contact


The development of MetaMPICH is part of a bigger Metacomputing project called GTBW  (Gigabit Testbed West).

For further information on our part of this project described above, please contact:

    RWTH Aachen
    Lehrstuhl für Betriebssysteme
    Univ.-Prof. Dr. habil. Thomas Bemmerl
    Kopernikusstr. 16
    D-52056 Aachen, Germany

    Phone: +(49)-241-807634
    Fax : +(49)-241-8888339
    eMail:  contact@lfbs.rwth-aachen.de

Last Modification: 05. August 1999 by Martin