A work-stealing scheduler for X10's task parallelism with suspensionOlivier TardieuHaichuan Wanget al.2012PPoPP 2012Conference paper
Large-scale fast Fourier transform on a heterogeneous multi-core systemYan LiJeffrey R. Diamondet al.2012IJHPCAPaper
Providing source code level portability between CPU and GPU with MapCGChun-Tao HongDe-Hao Chenet al.2012Journal of Computer Science and TechnologyPaper
DMATiler: Revisiting loop tiling for direct memory accessHaibo LinTao Liuet al.2010PACT 2010Conference paper
DBDB: Optimizing DMA transfer for the cell BE architectureTao LiuHaibo Linet al.2009ICS 2009Conference paper
Orchestrating data transfer for the Cell/B.E. processorTong ChenHaibo Linet al.2008ICS 2008Conference paper