Home

Using High-Level Language to Implement Floating-Point Calculations on FPGAs

Mitrion-C version 1.4 was used to implement the two calculations. Each of the Quad-Data Rate (QDR) memories directly available to the Virtex-II Pro contains 4 MB of space for input/output, for a total of 16 MB of input and output. Since many scientific applications require more than 16 MB of input and output, a host program is needed to marshall data between the FPGA’s memory and host memory present on the same compute node.

The host program was written using the American National Standards Institute’s standard for C (ANSI-C), and run on one of the Advanced Micro Devices (AMD) Opteron 275 processors on the same compute node as the FPGA. The Cray XD1 supercomputer used in this project uses an interconnect system that allows data transfer between the FPGA and host RAM at a rate of 3.2 GB/s. Mitrion-C uses the full bandwidth provided by Cray.

In the host program, each of the FPGA’s QDR memories is treated as an array. The host program loads values into the arrays, sends the FPGA a start signal using a function provided by Mitrionics, and reads the results after it receives a done signal back from the FPGA.

The Mitrion-C program was split into three functions that: 1) read the inputs from QDR memory, 2) performed floating- point calculations, and 3) wrote the results to a different QDR memory. Data was stored in a list data structure and the program was run in a foreach loop. This combination explicitly instructs the Mitrion compiler to automatically pipeline the design.

As a benchmark, the performance of the Mitrion-C implementations of the ray-intersection calculation and normalvector calculation to ANSI-C programs was compared. Each of the 4 MB memories available to the Virtex-II Pro has a bitwidth of 64 bits. Although all four of the FPGA’s memories were used for input, two of the memories had to be used for output as well. Mitrion-C provides memory synchronization commands that enable bidirectional use of the FPGA’s memories with no effect on throughput.

As mentioned before, the maximum bandwidth of the interconnect, between the FPGA’s QDR memories and the host memories, is 3.2 GB/s. This means that each of the four QDR memories makes up 800 MB/s of that total. Since each FPGA memory can read or write 64 bits (8 bytes) every clock cycle, the 100-MHz clock used by Mitrion makes use of the maximum 800 MB/s bandwidth of the memories.

Measurements confirmed that a throughput very near the limit of the memories — 799.04MB/s in the case of the normal-vector calculation — could be maintained over a large sample of data. Mitrion-C is a straightforward way to achieve the maximum throughput allowed by the memory bandwidth, given that the intended design fits on the target FPGA.

This work was done by Kevin K. Liu, Charles B. Cameron, and Antal A. Sarkady of the US Naval Academy. NRL-0057

This Brief includes a Technical Support Package (TSP).

Using High-Level Language to Implement Floating-Point Calculations on FPGAs (reference NRL-0057) is currently available for download from the TSP library.

Please Login at the top of the page to download.