Martin L. Schmatz, Rik Jongerius, et al.
ICASSP 2014
To increase the performance of data-intensive applications, we present an extension to a CPU architecture that enables arbitrary near-data processing capabilities close to the main memory. This is realized by introducing a component attached to the CPU system-bus and a component at the memory side. Together they support hardware-managed coherence and virtual memory support to integrate the near-data processors in a shared-memory environment. We present an implementation of the components, as well as a systemsimulator, providing detailed performance estimations. With a variety of syntheticworkloadswe demonstrate the performance of the memory accesses, the mixed fine-And coarse-grained coherence mechanisms, and the near-data processor communication mechanism. Furthermore, we quantify the inevitable start-up penalty regarding coherence and data writeback, and argue that near-data processingworkloads should access data several times to offset this penalty. A case study based on the Graph500 benchmark confirms the small overhead for the proposed coherence mechanisms and shows the ability to outperform a real CPU by a factor of two.
Martin L. Schmatz, Rik Jongerius, et al.
ICASSP 2014
Erik Vermij, Leandro Fiorin, et al.
CF 2017
Leandro Fiorin, Gianluca Palermo, et al.
IEEE Transactions on VLSI Systems
Erik Vermij, Leandro Fiorin, et al.
IJHPCA