Shared Memory Interface

SMI

Features

Programming Model

Although our focus is on NUMA-clusters, the programming interface also make sense for (and is portable to) SMPs and real CC-NUMA machines.

A couple of full (Unix/Windows NT) processes that execute in SPMD-style the same code.

Topology etc.

Startup and base synchronization of all processes
Requesting process and machine rank as well as the machine rank of all other processes
Determine the machine on which a memory address is physically located
Watchdog functionality which detects if any process has crashed or was terminated

Shared Memory

Data-sharing via explicitely installed regions of shared memory

Each region consists out of consecutively addressable individual segments. The physical location of each segment can be each node of the cluster. The detailed memory layout can be specified as:

UNDIVIDED: the entire region is located on a single node.
BLOCKED: the region consists of as many segments as nodes are involved; segment i is located on node i.
CUSTOMIZED: the user can specify an arbitrary layout.
NONFIXED: the entire region is located on a single node, but not necessarily with the same fixed address in all processes as it is the case for all other layout types.
LOCAL: to exploit faster local shared memory between processes, this layout provides shared memory segment only between the processes running on the same machine

Usage of a region as:

A flat piece of memory, e.g. to store an array in it.
With the aid of the SMI memory manager for dynamic memory allocation within a region. Each process can allocate and free (under mutual exclusion) pieces of memory.

A shared region is mapped to the same address within each process' virtual address-space (expcept for NONFIXED), allowing the exchange of pointers and therefore the dynamic construction of arbitrary high-level shared data-structures.
Temporary switching to replication of shared regions, with the capability to combine the maybe different evolved copies to a single shared view again later on. Possible combine operations:

Element-wise arithmetic operations, e.g. ADD.
SINGLE_SOURCE (the data of one specified process comprises the shared data afterwards).
CONCAT (non-overlapping fractions of all processes' replicated data are concatenated).
user-defined functions, e.g. merging of sorted arrays, merging of graphs in a specific representation, etc.

Optimized functions for synchronous and asynchronous memory transfers

Synchronization

SMI provides a number of synchonization primitives via software algorithms, optimized for the NUMA performance characteristic, and via atomic SCI lock operations:

Mutex (allocate, lock, unlock, trylock, destroy)
Barrier
Process Counters

Additionally, there is dynamic loop-splitting and -scheduling support

Signalling between distinct processes (wait for signal/send a signal) or global signals which act like a broadcast. Callback functionality is provided, too.

Plattforms

Currently, SMI offers a C-and a Fortran-binding. It runs on:

Sun Sparc workstation clusters under Solaris with Dolphin PCI-SCI adapters.
Intel-based PC clusters under Solaris x86, Linux or Windows NT with Dolphin PCI-SCI adapters.

The PCI-SCI adapters are adressed via the SISCI interface which is (at least partly) also offered by the Scali SCI drivers running on the hpcLine systems by Siemens. Adaption to this reduced SISCI interface is currently in progess.