LFBS Home Lehrstuhl für Betriebssysteme RWTH
Disclaimer/
Haftungsausschluss
Webmaster
(admin@lfbs...)
Home > Research > MetaMPICH > Concepts  

MetaMPICH - Flexible Coupling of Heterogenous MPI Systems

Concepts of MetaMPICH

The Meta Configuration File

One core concept of MetaMPICH is to have a single, central configuration entity for the whole meta computer, the so-called meta configuration file.
The configuration as stated in this file differentiates between hosts and network interfaces and can easily handle multiple NICs of the same type in the same machine.
This way of configuring the system gives users a maximum of control over the configuration and allows to precisely match the configuration on the underlying hardware.
Therefore, MetaMPICH also provides two different methods to couple the distributed systems in order to meet the demand of the heterogeneous network architectures.

The Router Based Architecture

The first version of MetaMPICH implemented a router architecture for coupling the meta hosts, because the target platforms were MPP systems with dedicated I/O-nodes for external communication. These nodes were used to run the router processes.
When MetaMPICH was extended to support clusters of PCs later on, the router architecture still proved to be suitable.
These clusters had high-performance interconnects and were part of a Fast Ethernet LAN.
As the router connection over this LAN was a communication bottleneck when coupling clusters, the emerging Gigabit Ethernet technology was used to couple dedicated I/O-nodes of the clusters via router processes.
Routers can be used to build distributed meta computers with meta hosts hidden behind a firewall, because the concept eliminates the need to give all nodes of a cluster system access to an external network.
The MetaMPICH implementation of the router concept provides bundling of network interfaces increase bandwidth between meta hosts as well as static load balancing among multiple router connections between two meta hosts to offload the I/O-nodes.

The Multidevice Architecture

Recently, we complemented the router-based architecture of MetaMPICH by implementing a multidevice approach, which provides users of MetaMPICH with a new and unique way for coupling compute resources.
The approach is the implementation of two coexistent and independent MPICH communication devices on each meta host. While the primary device is cluster-specific, enabling each meta host to benefit from its internal high-speed network, the secondary device couples the meta hosts.
E.g., if two SCI clusters are coupled via Ethernet, the ch_smi device for SCI provides intra-cluster connectivity, whereas the ch_usock device (communication via sockets in MP-MPICH) is responsible for communication between processes running on different clusters.
By making use of a secondary device, it is possible to build routerless meta computers, i.e. configurations in which all meta hosts communicate via the secondary device.
Additionally, to give the users as much flexibility as possible, mixed configurations can be set up, in which some meta hosts are coupled via router processes and some via a secondary device.
To be able to use the ch_usock device as a secondary device for meta hosts which also use it internally, this device has been made instantiable, i.e. multiple instances of it can coexist and be used concurrently.
Print Version