The results below were produced using the Pallas MPI Benchmarks - PMB from
Pallas GmbH.
The description is based on the information in "Pallas MPI Benchmarks - PMB, Part MPI-1, Revision 2.2".
You can click on each graph to get a higher resolution (1240x800) image.
Meaning of the labels:
Bw: Bandwidth
Lt: Latency
2: 2 nodes
4: 4 nodes
and so forth.
Results are extracted from the files: ntmpich mpichnt mpipro wmpi2.
ntmpich: NT-MPICH,
the Windows part of MP-MPICH,
mpichnt: MPICH.NT 1.2.5, the Windows version of MPICH
by ANL/MSU,
mpipro:
MPI/Pro 1.6.4.1 for Windows by Verari Systems Software (formerly MPI Software Technology),
wmpi2: WMPI II 2.4.0 by
Critical Software.
The benchmarks were run using Fast Ethernet (100 MBit/s) network adapters.
Point to Point Performance
Point to point performance is measured between two processes within
the same node (memory performance) or between two nodes (network
performance). Node internal performance is measured in MBytes/s
per process (send+recv) in units of 220 octets per second.
Network performance is measured in MBytes/s I/O bandwidth per
node (send+recv) in units of 220 octets per second.
PMB PingPong
PingPong is the classical pattern for mesuring startup and throughput
of a single message sent between two processes. The communication
sequence is MPI_Recv() followed by a MPI_Send() in a loop.
PMB PingPing
PingPing is a concurrent twoway communication test.
As PingPong, PingPing measures the startup and throughput of a single message sent
between two processes, with the crucial difference that messages are obstructed
by oncoming messages. For this, two processes communicate (MPI_Isend/MPI_Recv/MPI_Wait)
with each other, with the MPI_Isend's issued simultaneously.
PMB Sendrecv
Based on MPI_Sendrecv(), the processes form a periodic communication chain.
Each process sends to the right and receives from the left neighbour in the chain.
PMB Exchange
Exchange is a communications pattern that often occurs in grid
splitting algorithms (boundary exchanges). The group of processes is
seen as a periodic chain, and each process exchanges data with both
left and right neighbor in the chain.
Collective Benchmarks
Collective or system-wide interconnect performance is measured between all or a subset of the
nodes in the system. All collective benchmarks are measured in MBytes/s accumulated
system throughput in units of 1e6 octets per second.
PMB Allreduce
Benchmark of the MPI_Allreduce function. Reduces vectors of length
L = X/sizeof(float) float items from every process to a single
vector and distributes it to all processes.
The MPI datatype is MPI_FLOAT, the MPI operation is MPI_SUM.
PMB Reduce
Benchmark of the MPI_Reduce function. Reduce vectors of length
L = X/sizeof(float) float items from every process to a single
vector in the root process. The MPI datatype is MPI_FLOAT, the MPI
operation is MPI_SUM. The root of the operation is changed cyclically,
PMB Reduce_scatter
Benchmark of the MPI_Reduce_scatter function. Reduce vectors of
length L = X/sizeof(float) float items from every process to a single
vector. The MPI datatype is MPI_FLOAT, the MPI operation is MPI_SUM.
In the scatter phase, the L items are split as evenly as possible
between all processes. More presise, when np = #processes, L =
r*np+s (s = L mod np), then process with rank i gets r+1 items when
i < s, and r items when i >= s.
PMB Allgather
Benchmark of the MPI_Allgather function. Every process sends X bytes
and receives the gathered X*(#processes) bytes.
PMB Allgatherv
Functionally this is the same as the Allgather, however with the
MPI_Allgatherv() function. Shows whether MPI produces overhead due to the more
compilcated situation as compared to MPI_Allgather().
PMB Alltoall
Benchmark of the MPI_Alltoall() function. Every process sends and
receives X*(#processes) bytes (X for each process).
PMB Broadcast
Benchmark of MPI_Bcast. A root process broadcasts X bytes to all other processes.
PMB Barrier
This is a benchmark of the MPI_Barrier() function. No data is exchanged.
Barrier |
Test |
2 |
4 |
8 |
ntmpich | 100.29 | 192.19 | 308.19 |
mpichnt | 119.91 | 272.55 | 447.08 |
mpipro | 191.25 | 374.00 | 557.97 |
wmpi2 | 108.48 | 223.23 | 342.67 |
References
[1] Pallas MPI Benchmarks - PMB, Part MPI-1, revision 2.2.1 (not downloadable at the moment, see this page for further information).
Acknowledgements
The script to transform the PMB results into HTML pages was originally written by
Lars Paul Huse from
Scali AS, Oslo. Thank you for supplying it to us.
MP-MPICH
SCI-MPICH
|