Michelle Brachman, Christopher Bygrave, et al.
AAAI 2022
Modern ETL (Extract, Transform, Load) tools offer graphical, no-code interfaces for workflow creation but still require users to manually identify transformation functions and configure their properties, which is time-consuming and demands prior expertise. We present the research and engineering foundations of the IBM DataStage Assistant, a deployed capability that generates complete multi-stage ETL flows directly from natural language (NL) descriptions. Our framework infers transformation functions, their properties, and transformer expressions, enabling novices to discover relevant functions and allowing experts to bypass manual configuration. The proposed framework achieves a prediction accuracy of for flow predictions, for properties, and for transformer expressions. We also show a document exploration module that uses retrieval-augmented generation (RAG) over product documentation to answer tool-specific questions in NL. Implemented in IBM DataStage, this approach supports iterative, in-environment workflow design and reduces context switching. In initial studies, it achieves up to time savings for novices and for experts.
Michelle Brachman, Christopher Bygrave, et al.
AAAI 2022
siyu huo, Hagen Völzer, et al.
BPM 2021
Neil Thompson, Martin Fleming, et al.
IAAI 2024
Christodoulos Constantinides, Dhaval Patel, et al.
IAAI 2026