GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System ArchitectureZaid QureshiVikram Sharma Mailthodyet al.2023ASPLOS 2023Conference paper
Accelerating reduction and scan using tensor core unitsAbdul DakkakCheng Liet al.2019ICS 2019Conference paper