John M. Boyer, Charles F. Wiecha
DocEng 2009
BRAMs, which are specialized memory structures distributed throughout the FPGA fabric in columns, are of particular importance. Each BRAM can hold up to 36Kbits of data. BRAMs can be used in various form factors and can be cascaded to form a larger logical memory structure. Because of the distributed organization of BRAMs, they can provide terabytes of bandwidth for memory bandwidth-intensive applications. The contrast in performance between processors and FPGAs lies in the architecture itself. Processors rely on the Von Neumann paradigm where an application is compiled and stored in instruction and data memory. They typically work on an instruction and data fetch-decode-execute- store pipeline. This means both instructions and data have to be fetched from an external memory into the processor pipeline. Although caches are used to alleviate the cost of expensive fetch operations from external memory, each cache miss incurs a severe penalty. The bandwidth between processor and memory is often the critical factor in determining the overall performance.
John M. Boyer, Charles F. Wiecha
DocEng 2009
Ziyang Liu, Sivaramakrishnan Natarajan, et al.
VLDB
Eric Price, David P. Woodruff
FOCS 2011
William Hinsberg, Joy Cheng, et al.
SPIE Advanced Lithography 2010