GBDT-Based Credit Default Prediction

Research Article
Open access

GBDT-Based Credit Default Prediction

Chuyan Luo 1*
  • 1 Faculty of Economics, Jiangxi University of Finance and Economics, Nanchang, China    
  • *corresponding author 2202403261@stu.jxufe.edu.cn
AEMPS Vol.170
ISSN (Print): 2754-1177
ISSN (Online): 2754-1169
ISBN (Print): 978-1-80590-019-1
ISBN (Online): 978-1-80590-020-7

Abstract

In order to reduce the risk of default, machine learning techniques are relied upon to build models to predict defaults. This study focuses on the problem of default prediction in the credit market, based on the Lending Club dataset. And based on feature screening and relevance ranking, the features related to default are obtained and again analysed in detail with knowledge of economics. A variety of machine learning models LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, GradientBoostingClassifier, XGBoost, AdaBoost, Bagging were also used for training and comparison, followed by further optimisation of model performance through data balancing methods such as SMOTE, ADASYN, RandomOverSampler, RandomUnderSampler, SMOTEENN, SMOTETomek. The study discovered that loan interest rate, the number of times the borrower has been queried in the last six months, the credit score, and the monthly installments owed by the borrower had a strong effect on the target variable and were able to make a good prediction of defaults. The GBDT model based on boosting algorithm is trained better. And it is further improved with the balance of RandomOverSampler which has the most significant optimisation results. This study will focus on the above aspects to improve the accuracy of credit default prediction so as to improve credit risk prevention and control.

Keywords:

Credit default, GBDT, Risk prediction, Machine learning

Luo,C. (2025). GBDT-Based Credit Default Prediction. Advances in Economics, Management and Political Sciences,170,77-86.
Export citation

References

[1]. Ma W.M., (2024) Credit Default Prediction Based on k-Stratified SMOTE-CV with Stacking Integrated Learning. Intelligent Computers and Applications, 14, 146-152.

[2]. Cai Q.S., Wu J.D., Bai C.Y., (2021) Credit Default Prediction Based on Interpretable Integration Learning. Computer System Applications,30(12), 194–201.

[3]. Zhang J., ,(2022)Research on Bank Credit Customer Default Risk Prediction Based on Integrated Learning Models. Chengdu University of Technology. DOI:10.26986/d.cnki.gcdlc.2022.000890.

[4]. Wang X.Y., (2020) Research on Big Data Risk Control Model Based on GBDT Algorithm.Journal of Zhengzhou Aviation Industry Management College,38(05), 108-112.DOI:10.19327/j.cnki.zuaxb.1007-9734.2020.05.009.

[5]. Gao Y.J., (2023) Research on Credit Default Prediction Based on Optimal Base Model Integration Algorithm. Intelligent Computers and Applications,13(07), 64-70+75.

[6]. Luo Z.A., (2021) Research on Stacking Quantitative Stock Picking Strategy Based on Integrated Tree Modelling. Chinese Prices, 02, 81-84.

[7]. Lai W.B., (2023) Research on P2P Credit Default Prediction Based on CatBoost Stacking Approach. Jiangxi University of Finance and Economics. DOI:10.27175/d.cnki.gjxcu.2023.000789.

[8]. Wang S.Y., Cao Z.F., Chen M.Z., (2016) A study on the Application of Random Forest in Quantitative Stock Selection. Operations Research and Management,25(03), 163-168+177.

[9]. Asror N., Syed S., Khorshed A., (2022) Macroeconomic Determinants of Loan Defaults: Evidence from the U.S. peer-to-peer lending market. Research in International Business and Finance, Volume 59, 101516. ISSN 0275-5319,https://doi.org/10.1016/j.ribaf.2021.101516.

[10]. Liu B., Chen K., (2020) A Loan Risk Prediction Method Based on SMOTE and XGBoost. Computers and Modernisation,2, 26-30.


Cite this article

Luo,C. (2025). GBDT-Based Credit Default Prediction. Advances in Economics, Management and Political Sciences,170,77-86.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 9th International Conference on Economic Management and Green Development

ISBN:978-1-80590-019-1(Print) / 978-1-80590-020-7(Online)
Editor:Florian Marcel Nuţă
Conference website: https://2025.icemgd.org/
Conference date: 26 September 2025
Series: Advances in Economics, Management and Political Sciences
Volume number: Vol.170
ISSN:2754-1169(Print) / 2754-1177(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[1]. Ma W.M., (2024) Credit Default Prediction Based on k-Stratified SMOTE-CV with Stacking Integrated Learning. Intelligent Computers and Applications, 14, 146-152.

[2]. Cai Q.S., Wu J.D., Bai C.Y., (2021) Credit Default Prediction Based on Interpretable Integration Learning. Computer System Applications,30(12), 194–201.

[3]. Zhang J., ,(2022)Research on Bank Credit Customer Default Risk Prediction Based on Integrated Learning Models. Chengdu University of Technology. DOI:10.26986/d.cnki.gcdlc.2022.000890.

[4]. Wang X.Y., (2020) Research on Big Data Risk Control Model Based on GBDT Algorithm.Journal of Zhengzhou Aviation Industry Management College,38(05), 108-112.DOI:10.19327/j.cnki.zuaxb.1007-9734.2020.05.009.

[5]. Gao Y.J., (2023) Research on Credit Default Prediction Based on Optimal Base Model Integration Algorithm. Intelligent Computers and Applications,13(07), 64-70+75.

[6]. Luo Z.A., (2021) Research on Stacking Quantitative Stock Picking Strategy Based on Integrated Tree Modelling. Chinese Prices, 02, 81-84.

[7]. Lai W.B., (2023) Research on P2P Credit Default Prediction Based on CatBoost Stacking Approach. Jiangxi University of Finance and Economics. DOI:10.27175/d.cnki.gjxcu.2023.000789.

[8]. Wang S.Y., Cao Z.F., Chen M.Z., (2016) A study on the Application of Random Forest in Quantitative Stock Selection. Operations Research and Management,25(03), 163-168+177.

[9]. Asror N., Syed S., Khorshed A., (2022) Macroeconomic Determinants of Loan Defaults: Evidence from the U.S. peer-to-peer lending market. Research in International Business and Finance, Volume 59, 101516. ISSN 0275-5319,https://doi.org/10.1016/j.ribaf.2021.101516.

[10]. Liu B., Chen K., (2020) A Loan Risk Prediction Method Based on SMOTE and XGBoost. Computers and Modernisation,2, 26-30.