Fast matrix multiplication via compiler-only layered data reorganization and intrinsic loweringBraedy KuzmaIvan Korostelevet al.2023Software - Practice and ExperiencePaper
YaConv: Convolution with Low Cache FootprintIvan KorostelevJoao P. L. de Carvalhoet al.2023ACM TACOPaper