The most powerful computers of the 1970s and 1980s tended to be vector machines which are classic SIMD machines. The GPU (Graphics Processing Unit), and in particular the Fermi architecture with its HPC features, are bringing vector processors back into supercomputing.
The Fermi architecture from NVIDIA with the improved Double Precision Performance has dramatically increased the programmability and compute efficiency of GPGPU (General Purpose Computation on Graphics Processing Unit). The huge number of cores 448 allows a greater number of threads to execute at any given time. The computing engine in Nvidia graphics processing units which is called CUDA (Compute Unified Device Architecture) provides both a low level and a higher level APIs.
In almost the same manner, the FireStream from AMD could find a way to be accepted in the field of High-Performance Computing. On considering the Cost-benefit perspective, the FireStream fulfills high demands on both performance and reliability. Mostly doing the same as CUDA, OpenCl - The open standard for parallel programming of heterogeneous systems was developed on an open committee, it gives users a cross-vendor solution for accelerating their applications on different devices. In contrast to CUDA it works on different hardware since it is designed to work on multiple platforms.
Task scheduling on GPUs-based Heterogeneous Systems
Since heterogeneous computing systems are becoming more and more prevalent for numerical computations and specially in HPC applications, one of our major research topics is to develop a Task Scheduler which aims to utilize all processors and GPUs in a heterogeneous system with off-the-shelf hard- and software effectively.
GPUs are specially suited for data-parallelism, groups of processing cores work in a SIMD manner. Supporting task-parallelism or concurrent execution implies being able to examine pending kernels (special name of computing tasks in GPU), decide whether co-scheduling is desirable and initiate the desired concurrent execution. This decision rests upon ensuring that resources are not over-subscribed including thread contexts, memory capacity, memory bandwidth and possibly other resources such as registers, constant memory and texture memory in the case of GPUs.
The goal of this project is to develope a runtime environment that makes a decision about the desired device for a concurrent execution as necessary.