![]() |
![]() |
|
| UP | > |
|
IntroductionConcept Results so far Work in progress Partners & contact |
The coupling of parallel systems is
an evolving technique to solve more complex thus larger problems
in High Performance Computing and to efficiently use available computing
resources. The resulting system, which consists of multiple independent
and heterogenous (with respect to the operating system, architecture, CPU
type and internal data representation) systems is usually called a Metacomputer,
although a generally accepted defnition of this term does not yet exist.
The definition we use for Metacomputer is illustrated in figure
1.
Figure 1: Basic principle of a Metacomputer To ease the utilization of any parallel system (like such a Metacomputer), which in many cases means finding the solution to a scientific problem which is described in mathematical formulas, a programming model and a corresponding interface are required. A widely used programming model for parallel systems is SIMD (Single Instruction Multiple Data which means that all CPUs process the same code on different parts of the data which makes up the problem to solve) with Message Passing (exchanging data between processes by explicitly sending and receiving specified portions of data) as communication means. The MPI (Message Passing Interface) programming interface is based upon this model, and is available on virtually every existing computing system. Next to the vendor supplied implementations of MPI, MPICH is the most popular implementation which is freely available for a wide variety of UNIX-like systems. |
|
| UP | MetaMPICH - Flexible Coupling of Heterogenous MPI Systems | |
IntroductionConceptResults so far Work in progress Partners & contact |
The environment for the metacomputer
we wanted to create leads to the following requirements:
MPICH includes the P4-communication
device which offers some kind of heterogenous message passing for quite
a while, and the current release of MPICH (1.1.1) offers even more
Metacomputing functionality via the GLOBUS device. However, not all of
the requirements above can be met with these solutions, which led us to
design and create MetaMPICH.
Figure 2: Design of the MetaMPICH extension to MPICH Figure 2 shows the software design of MetaMPICH, derived from the first two requirements, combined with the architecture of MPICH: dedicated router processes are used for the inter-host communication, while the intra-host communicaton is done by the native communication means on each host. If a message has to be routed to another host, it is sent to the appropriate router process via the Gateway device, which in turn uses the native communication means of the host. When the message arrives at the target host, it is sent to the target process through the tunnel device. Using this design, MetaMPICH is very independent concerning the native communication layer, but can be used on every system which offers a MPICH channel device. An example for a possible configuration which can be realised through this design is shown in figure 3. (the last two requirements in the list above implicate just this kind of setup).
| |
| UP | MetaMPICH - Flexible Coupling of Heterogenous MPI Systems | |
|
Introduction Concept Results so farWork in progress Partners & contact |
The development takes place in multiple phases; the first phases which
will build the initial version of the planned Metacomputer is described
below.
Prototype implementationThe first phase of the development consisted of the prototype implementation based upon the Sun Solaris operating system (running on Sparc or x86 platforms). This prototype was finished in Summer 1998 and demonstrated the functionality of the concept and its implementation. We used multiple dual-processor workstations connected via standard, fast and multiple ethernet connections to develop and test the MetaMPICH library. However, it does not make much sense to use MetaMPICH on this platform to evaluate the performance of this Metacomputer due to the need of dedicated router processes on each host.Port to Cray T3E and SP2The prototype code had to be ported to the primary target platforms Cray T3E and IBM SP2 which normally would have been no big issue if it was not for some unexpected disfunctionality of these systems. Nevertheless, the T3E port has been done and was tested between two separate T3E systems and one T3E and a Sun Enterprise Server. The SP2 port is on the way; first meaningful performance numbers will be available when it is finished. The work of this phase is done by PALLAS .Incorporation of MagPIeTo improve the performance of these collective communications in a Metacomputer which involve more than one host, the MPICH extension MagPIe was adapted to MetaMPICH. The evaluation of the associated performance gain can be done as with the availability of the SP2 port. |
|
| UP | MetaMPICH - Flexible Coupling of Heterogenous MPI Systems | |
|
Introduction Concept Results so far Work in progressPartners & contact |
MetaPerfMetaPerf is a performance monitor for the communication between the hosts of a Metacomputer. The MetaMPICH router processes can be configured to act as servers which deliver relevant performance data about the messages sent and received to the MetaPerf client which analyzes and displays this data.MetaEditThe definition of the topology of a Metacomputer is specified in a configuration file which can become quite complex and hard to maintain manually. MetaEdit is a Java application that simplifies the creation and maintainance of these configuration files by supplying a graphical user interface and an assistant which leads the user through the entire process.Next to these two sub-projects, the optimization of the routing itself and the extension of the possible inter-host topologies is a constant process. As there now exists a MPICH version for SCI-connected clusters (SCI-MPICH), we think of SCI support in MetaMPICH in the future. Other efforts concern the inter-host connections. We plan to use ATM adapters for the router-router connections. |
|
| UP | MetaMPICH - Flexible Coupling of Heterogenous MPI Systems | |
|
Introduction Concept Results so far Work in progress Partners & contact |
The development of MetaMPICH is part of a bigger Metacomputing project
called GTBW
(Gigabit Testbed West).
For further information on our part of this project described above, please contact:
Lehrstuhl für Betriebssysteme Univ.-Prof. Dr. habil. Thomas Bemmerl Kopernikusstr. 16 D-52056 Aachen, Germany Phone: +(49)-241-807634
Last Modification: 05. August 1999 by Martin |