
Large Language Models Meet Automated Program Repair: Innovations, Challenges and Solutions
- 1 Northwest Minzu University, Lanzhou, China
* Author to whom correspondence should be addressed.
Abstract
As the field of Automated Program Repair (APR) continues to evolve, traditional Neural Program Repair (NPR) methods, while successful in low-resource computing scenarios, still confront numerous challenges, including the demand for extensive training data, the limited generality of specially designed networks, and a lack of robustness. In recent years, Large Language Models (LLMs) have demonstrated remarkable efficacy in downstream code-related tasks, thanks to their potent comprehension and text generation capabilities, gradually emerging as pivotal tools in automated program repair. Compared to NPR techniques, LLM-based APRs exhibit superior repair performance and enhanced generality, leading to their increasing adoption in APR tasks. Currently, the performance of zero-shot LLM-based APRs has surpassed that of NPR. LLM-based APRs have issues, such as excessive fine-tuning costs, data leakage concerns, and a shortage of domain-specific knowledge. This paper aims to review and summarize the latest advancements in LLM-based APRs from the perspectives of innovation, challenges, and solutions, providing researchers with profound insights and future directions.
Keywords
Automated Program Repair, Neural Program Repair, Large Language Model, Software Quality Assurance, Survey
[1]. Wenkang Zhong, Hongliang Ge, Hongfei Ai, Chuanyi Li, Kui Liu, Jidong Ge, and Bin Luo. (2022). StandUp4NPR: Standardizing SetUp for Empirically Comparing Neural Program Repair Systems. In 37th IEEE/ACM International Conference on Automated Software Engineering (ASE '22), Rochester, MI, USA, October 10-14, 2022, 13 pages. https://doi.org/10.1145/3551349.3556943
[2]. Nan Jiang, Thibaud Lutellier, and Lin Tan. (2021). Cure: Code-Aware Neural Machine Translation for Automated Program Repair. In 43rd IEEE/ACM International Conference on Software Engineering (ICSE, 2021), Madrid, Spain, May 22-30, 2021, 1161-1173. https://doi.org/10.1109/ICSE43902.2021.00107
[3]. Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. (2019). An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation. ACM Transactions on Software Engineering and Methodology.
[4]. He Ye, Matias Martinez, Thomas Durieux, and Martin Monperrus. (2021). A Comprehensive Study of Automated Program Repair on the QuixBugs Benchmark. Journal of Systems and Software.
[5]. Zhou Yang, Zhipeng Zhao, Chenyu Wang, Jieke Shi, Dongsun Kim, Donggyun Han, and David Lo. (2024). Unveiling Memorization in Code Models. In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE '24), Lisbon, Portugal, April 14-20, 2024, 13 pages. https://doi.org/10.1145/3597503.3639074
[6]. Yihong Dong, Xue Jiang, Huanyu Liu, Zhi Jin, Bin Gu†, Mengfei Yang‡, and Ge Li*. (2024). Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models. arXiv preprint arXiv:2402.15938v3.
[7]. Wenkang Zhong, Chuanyi Li, Jidong Ge, and Bin Luo. (2022). Neural Program Repair: Systems, Challenges, and Solutions. arXiv preprint arXiv:2202.10868.
[8]. Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, and Alexey Svyatkovskiy. (2023). InferFix: End-to-end program repair with LLMs over Retrieval - retrieval-augmented prompts. In Proceedings of ACM Conference (Conference’17). ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
[9]. Xin Yin, Chao Ni, Shaohua Wang, Zhenhao Li, Limin Zeng, and Xiaohu Yang. (2024). ThinkRepair: Self-Directed Automated Program Repair. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’24), Vienna, Austria, September 16–20, 2024, 13 pages. https://doi.org/10.1145/3650212.3680359
[10]. Dancheng Liu∗, Amir Nassereldine∗, Ziming Yang∗, Chenhui Xu, Yuting Hu, Jiajie Li, Utkarsh Kumar, Changjae Lee, Jinjun Xiong†. (2024). Large Language Models have Intrinsic Self-Correction Ability—arXiv preprint arXiv:2406.15673v1.
[11]. He Ye, Matias Martinez, and Martin Monperrus. (2021). Automated patch assessment for program repair at scale. Empir. Softw. Eng. 26(2), 20. https://doi.org/10.1007/s10664 - 020 - 09920 - w
[12]. Chunqiu Steven Xia and Lingming Zhang. (2023). Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT—arXiv preprint arXiv:2304.00385.
[13]. Chunqiu Steven Xia and Lingming Zhang. (2023). Revisiting the Plastic Surgery Hypothesis via Large Language Models. arXiv preprint arXiv:2303.10494.
[14]. Yun Peng, Shuzheng Gao, Cuiyun Gao, Yintong Huo, and Michael R. Lyu. (2023). Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors. In Proceedings of 46th International Conference on Software Engineering (ICSE 2024).
[15]. Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, and Alexey Svyatkovskiy. (2023). InferFix: End-to-end program repair with LLMs over Retrieval - retrieval-augmented prompts. In Proceedings of ACM Conference (Conference’17).
[16]. Islem Bouzenia, Premkumar Devanbu, and Michael Pradel. (2024). RepairAgent: An Autonomous, LLM - Based Agent for Program Repair. arXiv preprint arXiv:2403.17134v1.
[17]. Cheryl Lee, Chunqiu Steven Xia, Jen - tse Huang, Zhouruixin Zhu, Lingming Zhang, and Michael R. Lyu. (2024). A Unified Debugging Approach via LLM - Based Multi-Agent Synergy.
[18]. Xin Yin, Chao Ni, Shaohua Wang, Zhenhao Li, Limin Zeng, and Xiaohu Yang. (2024). ThinkRepair: Self-Directed Automated Program Repair. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’24), Vienna, Austria, September 16–20, 2024.
[19]. Mingyuan Wu, Jiahong Xiang, Xiaoyang Xu, Fanchu Kong, Haotian Zhang, and Yuqun Zhang. (2024). How Far Can We Go with Practical Function-Level Program Repair? arXiv preprint arXiv:2404.12833v1.
[20]. Chunqiu Steven Xia and Lingming Zhang. (2022). Less Training, More Repairing Please: Revisiting Automated Program Repair via Zero-Shot Learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, November 14 - 18, 2022.
[21]. Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, and Alexey Svyatkovskiy. (2023). InferFix: End-to-end program repair with LLMs over Retrieval - retrieval-augmented prompts. In Proceedings of ACM Conference (Conference’17).
[22]. Islem Bouzenia, Premkumar Devanbu, and Michael Pradel. (2024). RepairAgent: An Autonomous, LLM - Based Agent for Program Repair. arXiv preprint arXiv:2403.17134v1.
[23]. Chunqiu Steven Xia and Lingming Zhang. (2023). Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT—arXiv preprint arXiv:2304.00385.
[24]. Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan, and Alexey Svyatkovskiy. (2023). InferFix: End-to-end program repair with LLMs over Retrieval - retrieval-augmented prompts. In Proceedings of ACM Conference (Conference’17).
[25]. Chunqiu Steven Xia and Lingming Zhang. (2023). Revisiting the Plastic Surgery Hypothesis via Large Language Models. arXiv preprint arXiv:2303.10494.
[26]. Islem Bouzenia, Premkumar Devanbu, and Michael Pradel. (2024). RepairAgent: An Autonomous, LLM - Based Agent for Program Repair. arXiv preprint arXiv:2403.17134v1.
[27]. Cheryl Lee∗, Chunqiu Steven Xia†, Jen - tse Huang∗, Zhouruixin Zhu‡, Lingming Zhang†, and Michael R. Lyu∗. (2024). A Unified Debugging Approach via LLM - Based Multi-Agent Synergy.
[28]. Xin Yin, Chao Ni, Shaohua Wang, Zhenhao Li, Limin Zeng, and Xiaohu Yang. (2024). ThinkRepair: Self-Directed Automated Program Repair. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’24), Vienna, Austria, September 16–20, 2024.
[29]. Mingyuan Wu, Jiahong Xiang, Xiaoyang Xu, Fanchu Kong, Haotian Zhang, and Yuqun Zhang. (2024). How Far Can We Go with Practical Function-Level Program Repair? arXiv preprint arXiv:2404.12833v1.
[30]. Yun Peng, Shuzheng Gao, Cuiyun Gao, Yintong Huo, and Michael R. Lyu. (2023). Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Errors. In Proceedings of 46th International Conference on Software Engineering (ICSE 2024).
[31]. Chunqiu Steven Xia and Lingming Zhang. (2023). Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT—arXiv preprint arXiv:2304.00385.
[32]. Mingyuan Wu, Jiahong Xiang, Xiaoyang Xu, Fanchu Kong, Haotian Zhang, and Yuqun Zhang. (2024). How Far Can We Go with Practical Function-Level Program Repair? arXiv preprint arXiv:2404.12833v1.
[33]. Wenkang Zhong, Hongliang Ge, Hongfei Ai, Chuanyi Li, Kui Liu, Jidong Ge, and Bin Luo. (2022). StandUp4NPR: Standardizing SetUp for Empirically Comparing Neural Program Repair Systems. In 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22), Rochester, MI, USA, October 10–14, 2022.
[34]. Li Qing - Yuan, ZHONG Wen - Kang, LI Chuan - Yi, GE Ji - Dong, LUO Bin. (2024). Empirical Study on the Data Leakage Problem in Neural Program Repair. Journal of Software, 35 (7), 3071–3092.
[35]. Mingyuan Wu, Jiahong Xiang, Xiaoyang Xu, Fanchu Kong, Haotian Zhang, and Yuqun Zhang. (2024). How Far Can We Go with Practical Function-Level Program Repair? arXiv preprint arXiv:2404.12833v1.
Cite this article
Tang,Y. (2024). Large Language Models Meet Automated Program Repair: Innovations, Challenges and Solutions. Applied and Computational Engineering,113,57-65.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 2nd International Conference on Machine Learning and Automation
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).