A need exists for small, autonomic systems in the battlefield. Autonomy allows the creation of unmanned systems to perform complex, high-risk, and/or covert operations in the battlefield without the need for constant human operation. Current computing systems are not optimized to perform intelligent operations such as environmental awareness, learning, and autonomic decisions in a size, weight, and power form factor that matches platforms envisioned for future use.

The major components of the Chip Design. Four cores are tiled onto a 7 x 7 mm chip.
The goal of the Architectures for Cognitive Systems research project was to develop computer hardware that is optimized to perform massively parallel cognitive computing operations such as are required for performance of cognitive primitive operations. A highly modular many-node chip was designed that addressed power efficiency to the maximum extent possible. Each node consists of an Asynchronous Field Programmable Gate Array (AFPGA), onboard Static Random Access Memory (SRAM), and an Application Specific Processor core (ASP). The figure shows the major components of the chip design. Four cores are tiled onto a 7 × 7 mm chip.

The term “cognitive operations” emphasizes lower-level operations that require massively parallel computing to perform in real time at a resolution rivaling human operations. An example is processing of visual images for object recognition. The project focused on architecture development to enable massively parallel processing, and the optimization of algorithms to utilize the new hardware architecture. The approach was to develop an architecture for processing the cognitive primitives that was not subject to limitations to parallelism that restricts Von Neumann type systems.

Efforts are underway to emulate human-brain-scale processing. There are multiple approaches that can be differentiated by the resolution level in the emulation used and the use for the output of the emulation. Of contention is whether or not emulation down to the molecular level is required for the computing system to perform and not just simulate or emulate various levels of cognitive functions. What is not in contention is the issue of the energy needed to achieve human-scale operations. Given the current rate of progress in energy efficiency, it is estimated that a human-scale system using the current processor and large supercomputer architectures will require megawatts to operate.

Human-scale systems, or brains, work around the limits of Von Neumann and Amdahl by using a concurrent, dynamic, massively parallel processing network. In this project, the processor was significantly reduced in size versus commercial processors. The cognitive operation primitive was set at the functional level. As the scale of the total system is increased by clustering nodes, responsibility for cognitive primitives will move up from the traditional “each processor is performing many serial cognitive primitive operations” to a network of nodes level. Each node will be responsible for a single cognitive primitive and be capable of performing the operation very quickly.

This network workload architecture will require a very large number of nodes to accommodate a large range of knowledge for cognitive operations. In this manner, the system parallelism is pushed closer to the level at which the cognitive primitive is performed. The network node becomes the functional primitive hardware unit for semantic operations. A semantic network node architecture was impractical in the past because there was more commercial benefit in building one very large, fast processor in a fixed area than to divide the same chip area into many smaller processors.

The features that make the architecture developed in this project useful for cognitive operations also make it useful for many other military applications. The architecture makes major progress in the trade space for size, weight, energy demand, cyber security, system reliability, processing speed, modularity, bandwidth internal to a cluster, and flexibility of operation and resource control.

The Floating Point Unit (FPU) in the ASP was optimized for extremely energy-efficient processing of Fast Fourier Transform algorithms. This makes the architecture a very powerful system for Parallel Discrete Event Simulation used in planning tools. The modularity and ability to dynamically reallocate resources provides opportunities for several cyber-security hardware features including node-level Advanced Encryption Standard (AES) encryption.

This work was done by Thomas E. Renz of the Air Force Research Laboratory. AFRL-0191

This Brief includes a Technical Support Package (TSP).
Architectures for Cognitive Systems

(reference AFRL-0191) is currently available for download from the TSP library.

Don't have an account? Sign up here.

Defense Tech Briefs Magazine

This article first appeared in the April, 2011 issue of Defense Tech Briefs Magazine.

Read more articles from this issue here.

Read more articles from the archives here.