Towards Automating the AI Operations Lifecycle
Matthew Arnold, Jeffrey Boston, et al.
MLSys 2020
In the evolving landscape of software development and system op- erations, the demand for automating traditionally manual tasks has surged. Continuous operation and minimal downtimes highlight the need for automated detection and remediation of runtime anom- alies. Ansible, known for its scalable features, including high-level abstraction and modularity, stands out as a reliable solution for managing complex systems securely. The challenge lies in creat- ing an on-the-spot Ansible solution for dynamic auto-remediation, requiring a substantial dataset for in-context tuning of large lan- guage models (LLMs). Our research introduces KubePlaybook, a curated dataset with 130 natural language prompts for generat- ing automation-focused remediation code scripts. After rigorous manual testing, the generated code achieved an impressive 98.86% accuracy rate, affirming the solution’s reliability and performance in addressing dynamic auto-remediation complexities.
Matthew Arnold, Jeffrey Boston, et al.
MLSys 2020
Genady Ya. Grabarnik, Filippo Poltronieri, et al.
CASCON 2023
Saurabh Pujar, Luca Buratti, et al.
DAC 2023
Shubhi Asthana, Bing Zhang, et al.
INFORMS 2022