LFBS Home Lehrstuhl für Betriebssysteme RWTH
Disclaimer/
Haftungsausschluss
Webmaster
(admin@lfbs...)
Home > Research > MP-MPICH > Performance > NT-MPICH Performance > PMB Results > Shared Memory  

Pallas MPI Benchmarks Results

The results below were produced using the Pallas MPI Benchmarks - PMB from Pallas GmbH.
The description is based on the information in "Pallas MPI Benchmarks - PMB, Part MPI-1, Revision 2.2".

You can click on each graph to get a higher resolution (1240x800) image.

Meaning of the labels:
Bw: Bandwidth
Lt: Latency
2: 2 nodes
4: 4 nodes
and so forth.

Results are extracted from the files: ntmpich mpichnt wmpi2.

ntmpich: NT-MPICH, the Windows part of MP-MPICH,
mpichnt: MPICH.NT 1.2.5, the Windows version of MPICH by ANL/MSU,
wmpi2: WMPI II 2.4.0 by Critical Software.

The benchmarks were run using Shared Memory.


Point to Point Performance

Point to point performance is measured between two processes within the same node (memory performance) or between two nodes (network performance). Node internal performance is measured in MBytes/s per process (send+recv) in units of 220 octets per second. Network performance is measured in MBytes/s I/O bandwidth per node (send+recv) in units of 220 octets per second.

PMB PingPong

PingPong is the classical pattern for mesuring startup and throughput of a single message sent between two processes. The communication sequence is MPI_Recv() followed by a MPI_Send() in a loop.

PingPong Network performance 2 nodes

PMB PingPing

PingPing is a concurrent twoway communication test. As PingPong, PingPing measures the startup and throughput of a single message sent between two processes, with the crucial difference that messages are obstructed by oncoming messages. For this, two processes communicate (MPI_Isend/MPI_Recv/MPI_Wait) with each other, with the MPI_Isend's issued simultaneously.

PingPing Network performance 2 nodes

PMB Sendrecv

Based on MPI_Sendrecv(), the processes form a periodic communication chain. Each process sends to the right and receives from the left neighbour in the chain.

Sendrecv Network performance 2 nodes
Sendrecv Network performance 4 nodes
Sendrecv Network performance 8 nodes

PMB Exchange

Exchange is a communications pattern that often occurs in grid splitting algorithms (boundary exchanges). The group of processes is seen as a periodic chain, and each process exchanges data with both left and right neighbor in the chain.

Exchange Network performance 2 nodes
Exchange Network performance 4 nodes
Exchange Network performance 8 nodes

Collective Benchmarks

Collective or system-wide interconnect performance is measured between all or a subset of the nodes in the system. All collective benchmarks are measured in MBytes/s accumulated system throughput in units of 1e6 octets per second.

PMB Allreduce

Benchmark of the MPI_Allreduce function. Reduces vectors of length L = X/sizeof(float) float items from every process to a single vector and distributes it to all processes. The MPI datatype is MPI_FLOAT, the MPI operation is MPI_SUM.

Allreduce Network performance 2 nodes
Allreduce Network performance 4 nodes
Allreduce Network performance 8 nodes

PMB Reduce

Benchmark of the MPI_Reduce function. Reduce vectors of length L = X/sizeof(float) float items from every process to a single vector in the root process. The MPI datatype is MPI_FLOAT, the MPI operation is MPI_SUM. The root of the operation is changed cyclically,

Reduce Network performance 2 nodes
Reduce Network performance 4 nodes
Reduce Network performance 8 nodes

PMB Reduce_scatter

Benchmark of the MPI_Reduce_scatter function. Reduce vectors of length L = X/sizeof(float) float items from every process to a single vector. The MPI datatype is MPI_FLOAT, the MPI operation is MPI_SUM. In the scatter phase, the L items are split as evenly as possible between all processes. More presise, when np = #processes, L = r*np+s (s = L mod np), then process with rank i gets r+1 items when i < s, and r items when i >= s.

Reduce_scatter Network performance 2 nodes
Reduce_scatter Network performance 4 nodes
Reduce_scatter Network performance 8 nodes

PMB Allgather

Benchmark of the MPI_Allgather function. Every process sends X bytes and receives the gathered X*(#processes) bytes.

Allgather Network performance 2 nodes
Allgather Network performance 4 nodes
Allgather Network performance 8 nodes

PMB Allgatherv

Functionally this is the same as the Allgather, however with the MPI_Allgatherv() function. Shows whether MPI produces overhead due to the more compilcated situation as compared to MPI_Allgather().

Allgatherv Network performance 2 nodes
Allgatherv Network performance 4 nodes
Allgatherv Network performance 8 nodes

PMB Alltoall

Benchmark of the MPI_Alltoall() function. Every process sends and receives X*(#processes) bytes (X for each process).

Alltoall Network performance 2 nodes
Alltoall Network performance 4 nodes
Alltoall Network performance 8 nodes

PMB Broadcast

Benchmark of MPI_Bcast. A root process broadcasts X bytes to all other processes.

Bcast Network performance 2 nodes
Bcast Network performance 4 nodes
Bcast Network performance 8 nodes

PMB Barrier

This is a benchmark of the MPI_Barrier() function. No data is exchanged.

Barrier
Test 2 4 8
ntmpich 1.96 13.75 83.29
mpichnt 21.21 80.81 244.94
wmpi2 10.44 39.87 191.78


References

[1] Pallas MPI Benchmarks - PMB, Part MPI-1, revision 2.2.1 (not downloadable at the moment, see this page for further information).

Acknowledgements

The script to transform the PMB results into HTML pages was originally written by Lars Paul Huse from Scali AS, Oslo. Thank you for supplying it to us.

 

MP-MPICH SCI-MPICH

Print Version