General Runtime/Architecture for Many-core Parallel Systems (GRAMPS)

The era of obtaining increased performance via faster single cores and optimized single-thread programs is over. Instead, a major factor in new processors’ performance comes from parallelism: increasing numbers of cores per processor and threads per core. In both research and industry, runtime systems, domain-specific languages, and more generally, parallel programming models, have become the tools to realize this performance and contain this complexity.

General Runtime/Architecture for Many-core Parallel Systems (GRAMPS) is a programming model for these heterogeneous, commodity, many-core systems that expresses programs as graphs of thread- and data-parallel stages communicating via queues. GRAMPS defines a programming model for expressing pipeline and computation-graph- style parallel applications. It exposes a small, high-level set of primitives designed to be simple to use, to exhibit properties necessary for highthroughput processing, and to permit efficient implementations. GRAMPS implementations involve various combinations of software and underlying hardware support, similar to how OpenGL permits flexibility in an implementation’s division of driver and GPU hardware responsibilities. However, unlike OpenGL, GRAMPS is without ties to a specific application domain. Rather, it provides a substrate upon which domain-specific models can be built.

GRAMPS is organized around the basic concept of application-defined independent computation stages executing in parallel, and communicating asynchronously via queues. GRAMPS is designed to be decoupled from application- specific semantics such as data types and layouts and internal stage execution. It also extends these basic constructs to enable additional features such as limited and full automatic intrastage parallelization and mutual exclusion. With these, applications can be built from many domains: rendering pipelines, sorting, signal processing, and others.

GRAMPS applications are expressed as execution graphs (also called computation graphs). The graph defines and organizes the stages, queues, and buffers to describe their data flow and connectivity. In addition to the most basic information required for GRAMPS to run an application, the graph provides valuable information about a computation that is essential to scheduling: insights into the program’s structure. These include application-specified limits to the maximum depth for each queue, which stages are sinks and which sources, and whether there are limits on automatically instancing each stage.

There are various ways for developers to create execution graphs, all of them wrappers around the same core API: an OpenGL/Streaming-like group of primitives to define stages, define queues, define buffers, bind queues and buffers to stages, and launch a computation. GRAMPS supports general computation graphs to provide flexibility for a rich set of algorithms. Graph cycles inherently make it possible to write applications that loop endlessly through stages and amplify queued data beyond the ability of any system to manage.

This work was done by Jeremy Sugarman of Stanford University for the Army Research Laboratory. ARL-0122



This Brief includes a Technical Support Package (TSP).
Document cover
General Runtime/Architecture for Many-core Parallel Systems (GRAMPS)

(reference ARL-0122) is currently available for download from the TSP library.

Don't have an account? Sign up here.