A sampling-based approach to information recovery
Junyi Xie, Jun Yang, et al.
ICDE 2008
Large-scale data analytics using statistical machine learning (ML), popularly called advanced analytics, underpins many modern data-driven applications. The data management community has been working for over a decade on tackling data management-related challenges that arise in ML workloads, and has built several systems for advanced analytics. This tutorial provides a comprehensive review of such systems and analyzes key data management challenges and techniques. We focus on three complementary lines of work: (1) integrating ML algorithms and languages with existing data systems such as RDBMSs, (2) adapting data management-inspired techniques such as query optimization, partitioning, and compression to new systems that target ML workloads, and (3) combining data management and ML ideas to build systems that improve ML lifecycle-related tasks. Finally, we identify key open data management challenges for future research in this important area.
Junyi Xie, Jun Yang, et al.
ICDE 2008
Haixun Wang, Hao He, et al.
ICDE 2006
Matthias Boehm, Alexandre V. Evfimievski, et al.
BTW/DBIS 2019
Ke Yi, Hao He, et al.
SIGMOD 2004