T est and evaluation of current systems is a time-consuming process that reflects both the intricacies of the object of the test and the range of equipment, personnel, and environments required. Many argue that this process consumes far too much of the time that it takes to put new systems into the hands of the warfighters, and uses way too many resources without much obvious benefit for those in combat. One solution to ameliorating these costs and delays is the increased use of computer simulations, ranging from agent-based models of battle spaces, to Mechanical Computer-Aided Engineering (MCAE) analyses of hardware, to esoteric simulations using computational fluid dynamics to assess everything from new airframes to dispersion of chemical and biological agents.

One potential approach to reducing costs, time to roll-out, and physical danger, all while improving validity, transparency, and utility, is to adopt the strategy of heterogeneous computing. Heterogeneous computing is the use of a variety of different types of computational units to aid the central processing unit (CPU), such as accelerators like General Purpose Graphics Processing Units (GPGPUs), field programmable gate arrays (FPGAs), and digital signal processors.

The technique presented here is to use GPGPUs to effectively handle computationally intensive activity ‘‘spikes.’’ Three specific aspects of the use of GPGPUs are presented: code drafting and development hurdles and opportunities, codes modified in several areas of computational science, and a wide range of software results in floating point operations per second (FLOPS) per watt parameters in various hardware configurations.

To better analyze potential test and evaluation use, this method was implemented for forces modeling and simulation. Existing DOD simulation codes running on advanced Linux clusters were used. GPGPU experiments were first conducted on a more manageable code set to ease the programming burden and hasten the results. Basic Linear Algebra Subprogram routines were seen as appropriate candidates. An MCAE ‘‘crash code’’ arithmetic kernel was used as vehicle for a basic demonstration problem. This preliminary characterization of GPU acceleration focused on a subset of the large space of numerical algorithms, in this case factoring large sparse symmetric indefinite matrices.

Finding a great deal of interest in GPGPU acceleration, the following work, while necessarily preliminary because of the design dynamics of the devices being offered, may prove useful to those facing power issues today. In any case, these analyses do support the proposition that the use of GPGPUs is probably indicated as a viable method for reducing power consumption per unit of computation (usually quantified here as FLOPS).

One can examine the extra power requirement for a system, first at the maximum power drain specified, then the drain at high computational loads, the drain at idle, and finally the drain with the GPGPU card removed from the node. Three versions of NVIDIA GPUs were tested: the 8800, 9400, and 9800. In each case, the host for the GPGPU was chosen to best complement the GPU itself, so different platforms were used in every instance. This is a necessary result of the choice of the target GPUs and would be more convoluted if they were all tried on one platform with the concomitant compromises.

A Model 22-602 Radio Shack AC ammeter probe was used to test current flow to the entire node. In each case, the amperage was measured, within the accuracy of the meter, of the current to the node under test while exercising the GPU (a) to the maximum extent feasible, (b) at idle while running, (c) at a sleep or hibernate mode, and (d) then finally, with the subject card removed.

Care should be exercised if trying to calculate actual amperages to be experienced in different computational environments and using different analytic tools. The accuracy of the meter used could be reliably certain to return comparative figures, but the absolute numbers might be off by some significant fraction. Test and retest numbers were very stable, giving some assurance that the comparative values were meaningful. These data indicate that the entire node takes on the order of 50 percent more power at full load and that the GPGPU adds on the order of 15 to 20 percent power consumption, even at rest, assuming one GPGPU card per processor.

This work was done by Dan M. Davis, Gene Wagenbreth, and Robert F. Lucas of the Information Sciences Institute at the University of Southern California; and Paul C. Gregory of Lockheed Martin for the Air Force Research Laboratory. AFRL-0199

This Brief includes a Technical Support Package (TSP).
Test and Evaluation Uses of Heterogeneous Computing

(reference AFRL-0199) is currently available for download from the TSP library.

Don't have an account? Sign up here.