NT-MPICH
LfBS Logo
Up Performance of NT-MPICH
Measurements This page shows a comparison between our NT-MPICH, the original MPICH.NT, WMPI 1.5 which is (in contrast to version 1.3 that was based on a port of the MPICH ch_p4 device) a newly developed MPI implementation, and MPI/Pro. Both MPI/Pro and WMPI are commercial, while NT-MPICH and MPICH.NT are freely available.
MPICH.NT was developed by the MPICH people at ANL. It uses a new device that has been created for Windows and supports both SMPs and network connected clusters. To gain maximum performance it has been configured with MPICH_USE_POLLING=1 and MPICH_SINGLETHREAD=1.
We used dual PII, 450 MHz machines to gather the results shown below. All measurements have been made with two processes.
The tested versions of the different implementations are:
NT-MPICH:  1.2
MPICH.NT:  1.2.1 (Feb. 16, 2001)
MPI/Pro:  1.6
WMPI:  1.53

For more details about the benchmark go to the MP-MPICH performance page.

You might also want to take a look at the results of the Pallas MPI Benchmark

Network performance
latency over fast ethernet bandwidth over fast ethernet
The charts above show latency (RTT/2) and bandwidth for two dual nodes connected via fast ethernet.
Click on one of the pictures to get a table with the results.
SMP performance
latency on a SMP bandwidth on a SMP
The charts above show latency (RTT/2) and bandwidth for two processes running on one SMP (Dual PII) node.
Click on one of the pictures to get a table with the results.
MPI
vs.
sockets
latency of MPI vs. Sockets bandwidth of MPI vs. Sockets
The charts above show latency (RTT/2) and bandwidth for two processes running on two Dual PII nodes. This time we used the benchmark of Mark Baker to test the MPI implementations in comparison with a native socket implementation.
Click on one of the pictures to get a table with the results.
Non-blocking performance
To demonstrate the advantages of real non-blocking communication we created a benchmark that uses MPI_sendrecv instead of doing distinct MPI_Send/MPI_Recv calls. MPI_Sendrecv is just a shortcut for the sequence MPI_IRecv(); MPI_ISend(); MPI_Waitall();. It is often used if processes have to exchange data. Since ch_wsock2 uses overlapping I/O on sockets, and fast ethernet works in full-duplex mode, messages can really be sent and received simultaneously, especially on a dual processor node. As the charts below show, NT-MPICH achieves a bandwidth of 20 MB/s on fast ethernet and halves the latency in comparison with the standard ping-pong test as shown above.
latency over fast ethernet bandwidth over fast ethernet
latency on a SMP bandwidth on a SMP
Click on one of the pictures to get a table with the results.
 
Author Karsten Scholtyssik