A NUCA substrate for flexible CMP cache sharing
Jaehyuk Huh, Changkyu Kim, et al.
ICS 2014
A previous evaluation of scheduled region prefetching showed that this technique eliminates the bulk of main-memory stall time for applications with spatial locality. The downside to that aggressive prefetching scheme is that, even when it successfully improves performance, it increases enormously the amount of superfluous memory traffic generated by a program. In this paper, we measure the predictability of spatial locality using density vectors, bit vectors that track the block-level access pattern within a region of memory. We evaluate a number of policies that use density vector information to filter out prefetches that are unlikely to be useful. We show, that, across our benchmarks, an average of 70% of useless prefetches can be eliminated with virtually no overall performance loss front reduced coverage. Thanks to the increase in prefetch accuracy a few benchmarks show performance improvements as high as 35% over the base region prefetching scheme.
Jaehyuk Huh, Changkyu Kim, et al.
ICS 2014
Premkishore Shivakumar, Michael Kistler, et al.
DSN 2002
Jaehyuk Huh, Changkyu Kim, et al.
IEEE TPDS
Jaehyuk Huh, Changkyu Kim, et al.
ICS 2005