C.A. Micchelli, W.L. Miranker
Journal of the ACM
Large Language Models (LLMs) have the potential to revolutionize how users interact with databases by enabling natural language queries. However, current LLMs struggle to generate accurate and reliable SQL queries due to challenges in syntax, domain-specificity, and confidence scoring. This work explores using reward modeling and Reinforcement Learning with AI Feedback (RLAIF) to address these limitations. We propose a novel data generation and SQL training approach that combines supervised fine-tuning with a reward model to improve the accuracy, correctness, and confidence of LLM-generated SQL queries in domain-specific tasks.