High-performance embedded systems, such as those found in aerial surveillance platforms and handheld devices used by the military, are of key importance. These systems are characterized by stringent power budgets and the need for extremely fast, streaming access to memory. While general-purpose processors offer a customizable solution, they typically do not meet the power and performance requirements for the systems in question. For this reason, specialized chip multiprocessors (CMPs) are used.
In this work, the problem of designing a network-on-chip architecture for an embedded computing platform was studied that supports both on-chip communication and off-chip memory access in a power-efficient way. The adoption of circuit-switched network-on-chip (NoC) architectures was proposed that rely on a simple mechanism to switch circuit paths off-chip to exchange data with the DRAM memory modules.
This memory access architecture was simulated on a 2S6-core chip with a concentrated 64-node network using detailed traces of high-performance embedded applications. Specifically, the projective transformation, matrix multiply, and Fast Fourier Transform (FIT) were all used for signal and image processing.
This work accomplished the first complete, detailed simulation of a nanophotonic NoC coupled with cycle-accurate DRAM control and physically accurate photonic device models. These simulations are used to determine the benefits of circuit switching and silicon photonic technology in CMP memory access performance.
Of key consideration to this work are embedded applications that involve signal and image processing (SIP). These applications typically require the aggregation and processing of many data points collected from various locations over a period of time. This data originates from sensors or other continuous data streams. A typical example of this is a camera or other sensor placed on an unmanned air vehicle (UAV). Applications in this domain require significant computing power in the form of high-bandwidth data access and streaming processing capabilities. In addition, they must achieve this using a low power budget.
In a circuit-switched network, a control network provides a mechanism for setting up and tearing down energy-efficient, high-bandwidth, end-to-end circuit paths. This method effectively relaxes the relationship between router buffer size, a large contributor to NoC power, and performance because router buffers do not become directly congested as communication demands grow. The control network uses smaller buffers and channels to transmit the small control messages, which reduces the total amount of buffering (and thus power) in the network. Because the higher-bandwidth data plane is circuit-switched end-to-end, it suffers from higher latency due to the circuit-path setup overhead, which must be amortized through a combination of larger messages and well-scheduled or time-division multiplexed communication patterns.
Aside from the power savings advantage, the complexity of the memory controller through circuit switching can be decreased considerably. A circuit-switched on-chip network can be allowed to directly access memory modules, giving a single core exclusive access to a memory module for the duration of the transaction it requested. Access overhead is amortized using increased burst lengths. The memory controller complexity can be greatly reduced because a memory module must sustain only one transaction at a time. The key difference is that each transaction is an entire message using long burst lengths, as opposed to small packets that must be properly scheduled.
Circuit-switching photonic networks can be achieved using active broadband ring resonators whose diameter is manufactured such that its resonant modes directly align with all of the wavelengths injected into the nearby waveguide. The ring resonator can be configured as a photonic switching element (PSE). By electrically injecting carriers into the ring, the entire resonant profile is shifted, effectively creating a spatial switch between the ports of the device. This process is analogous to setting the control signals of an electronic crossbar.
Given the operation of a single PSE, one can then construct higher order switches, and ultimately entire networks. Using ring resonator devices in this way opens the possibility to explore different network topologies in much the same way as packet-switched electronic networks.
The proposed circuit-switched memory access architecture requires slightly different usage of DRAM modules. In the Photonic Circuit-Accessed Memory Module (P-CAMM) design, individual conventional DRAM chips are connected via a local electronic bus to a central optical controller/transceiver. The controller is responsible for de-multiplexing the single optical channel into the address and data bus much in the same way as Rambus RDRAM memory technology.
This shift from electrical to photonic technology presents significant advantages for the physical design and implementation of off-chip signaling. One advantage is that the P-CAMM can be locally clocked, performing serialization and de-serialization on the I/O bitrate, and synchronizing it to the DRAM clock rate. Coding or clock transmission can be used to recover the clock in the transceiver, and matched to the local DRAM clock after de-serialization. Local clocking and the elimination of long printed circuit board (PCB) traces that the DRAM chips drove allow the P-CAMM to sustain higher clock frequencies than contemporary DRAM modules.
This work was done by Gilbert Hendry, Johnnie Chan, Keren Bergman, and Luca P. Carloni of Columbia University; and Eric Robinson, Vitaliy Gleyzer, and Nadya Bliss of MIT Lincoln Laboratories for DARPA. DARPA-0010