Codellm-Devkit: A Framework for Contextualizing Code LLMs with Program Analysis Insights Tool Demo
Abstract
Large Language Models for Code (or code LLMs) are increasingly gaining popularity and capa- bilities, offering a wide array of functionalities such as code completion, code generation, code summarization, test generation, code repair, refactoring, translation, and more. To leverage code LLMs to their full poten- tial, developers must provide code-specific contextual information to the models. These are typically derived and distilled using program analysis tools. However, there exists a significant gap—these static analysis tools are often language-specific and come with a steep learning curve, making their effective use challenging. These tools are tailored to specific program languages, requiring developers to learn and manage multiple tools to cover various aspects of the their code base. More- over, the complexity of configuring and integrating these tools into the existing development environments add an additional layer of difficulty. This challenge limits the potential benefits that could be gained from more widespread and effective use of static analysis in conjunction with code LLMs. In this tutorial, we introduce Codellm-Devkit (a.k.a CLDK)—an open-source1 library that significantly sim- plifies the process of performing program analysis at various levels of granularity. As a Python-based library, CLDK offers developers an intuitive and user-friendly interface, making it incredibly easy to provide rich program analysis context to code LLMs. With this library, developers can effortlessly integrate detailed, code-specific insights that enhance the operational ef- ficiency and effectiveness of LLMs in coding tasks. In this tutorial, participants will learn not only how to use CLDK but also how to extend its capabilities, thereby fully tapping into the advanced potential of code LLMs for a variety of development activities. This hands-on session will enable participants to per- form static analysis to build LLM-based solutions for coding tasks such as: (1) code generation, (2) code summarization, and (3) test generation across different programming languages. Through practical exercises, developers will gain hands-on experience in enhancing the functionality and applicability of code LLMs using CLDK’s APIs.