Disclaimer/
Haftungsausschluss

Webmaster
(admin@lfbs...)

Home > Research > MP-MPICH > Performance > NT-MPICH Performance > PMB Results > Shared Memory

Pallas MPI Benchmarks Results

Shared Memory

The results below were produced using the Pallas MPI Benchmarks - PMB from Pallas GmbH.
The description is based on the information in "Pallas MPI Benchmarks - PMB, Part MPI-1, Revision 2.2".

You can click on each graph to get a higher resolution (1240x800) image.

Meaning of the labels:
Bw: Bandwidth
Lt: Latency
2: 2 nodes
4: 4 nodes
and so forth.

Results are extracted from the files: ntmpich mpichnt wmpi2.

ntmpich: NT-MPICH, the Windows part of MP-MPICH,
mpichnt: MPICH.NT 1.2.5, the Windows version of MPICH by ANL/MSU,
wmpi2: WMPI II 2.4.0 by Critical Software.

The benchmarks were run using Shared Memory.

Point to Point Performance

Point to point performance is measured between two processes within the same node (memory performance) or between two nodes (network performance). Node internal performance is measured in MBytes/s per process (send+recv) in units of 2²⁰ octets per second. Network performance is measured in MBytes/s I/O bandwidth per node (send+recv) in units of 2²⁰ octets per second.

PMB PingPong

PingPong is the classical pattern for mesuring startup and throughput of a single message sent between two processes. The communication sequence is MPI_Recv() followed by a MPI_Send() in a loop.

PMB PingPing

PingPing is a concurrent twoway communication test. As PingPong, PingPing measures the startup and throughput of a single message sent between two processes, with the crucial difference that messages are obstructed by oncoming messages. For this, two processes communicate (MPI_Isend/MPI_Recv/MPI_Wait) with each other, with the MPI_Isend's issued simultaneously.

PMB Sendrecv

Based on MPI_Sendrecv(), the processes form a periodic communication chain. Each process sends to the right and receives from the left neighbour in the chain.

PMB Exchange

Exchange is a communications pattern that often occurs in grid splitting algorithms (boundary exchanges). The group of processes is seen as a periodic chain, and each process exchanges data with both left and right neighbor in the chain.

Collective Benchmarks

Collective or system-wide interconnect performance is measured between all or a subset of the nodes in the system. All collective benchmarks are measured in MBytes/s accumulated system throughput in units of 1e6 octets per second.

PMB Allreduce

Benchmark of the MPI_Allreduce function. Reduces vectors of length L = X/sizeof(float) float items from every process to a single vector and distributes it to all processes. The MPI datatype is MPI_FLOAT, the MPI operation is MPI_SUM.

PMB Reduce

Benchmark of the MPI_Reduce function. Reduce vectors of length L = X/sizeof(float) float items from every process to a single vector in the root process. The MPI datatype is MPI_FLOAT, the MPI operation is MPI_SUM. The root of the operation is changed cyclically,

PMB Reduce_scatter

Benchmark of the MPI_Reduce_scatter function. Reduce vectors of length L = X/sizeof(float) float items from every process to a single vector. The MPI datatype is MPI_FLOAT, the MPI operation is MPI_SUM. In the scatter phase, the L items are split as evenly as possible between all processes. More presise, when np = #processes, L = r*np+s (s = L mod np), then process with rank i gets r+1 items when i < s, and r items when i >= s.

Reduce_scatter Network performance 2 nodes

Reduce_scatter Network performance 4 nodes

Reduce_scatter Network performance 8 nodes

PMB Allgather

Benchmark of the MPI_Allgather function. Every process sends X bytes and receives the gathered X*(#processes) bytes.

PMB Allgatherv

Functionally this is the same as the Allgather, however with the MPI_Allgatherv() function. Shows whether MPI produces overhead due to the more compilcated situation as compared to MPI_Allgather().

PMB Alltoall

Benchmark of the MPI_Alltoall() function. Every process sends and receives X*(#processes) bytes (X for each process).

PMB Broadcast

Benchmark of MPI_Bcast. A root process broadcasts X bytes to all other processes.

PMB Barrier

This is a benchmark of the MPI_Barrier() function. No data is exchanged.

Barrier
Test	2	4	8
ntmpich	1.96	13.75	83.29
mpichnt	21.21	80.81	244.94
wmpi2	10.44	39.87	191.78

References

[1] Pallas MPI Benchmarks - PMB, Part MPI-1, revision 2.2.1 (not downloadable at the moment, see this page for further information).

Acknowledgements

The script to transform the PMB results into HTML pages was originally written by Lars Paul Huse from Scali AS, Oslo. Thank you for supplying it to us.

MP-MPICH SCI-MPICH

Print Version