Research on Data Driven Personal Credit Default Prediction: A Comparative Study of Random Forest and XGBoost Models

Research Article
Open access

Research on Data Driven Personal Credit Default Prediction: A Comparative Study of Random Forest and XGBoost Models

Kexin Wang 1*
  • 1 School of Economics & Management, Northwest University, Xi’an, China    
  • *corresponding author 2021103236@stumail.nwu.edu.cn
AEMPS Vol.193
ISSN (Print): 2754-1177
ISSN (Online): 2754-1169
ISBN (Print): 978-1-80590-201-0
ISBN (Online): 978-1-80590-202-7

Abstract

With the rapid development of consumer finance, personal credit business is becoming increasingly active in the financial market, but it is also accompanied by significant credit risks.How to use big data methods to achieve efficient and accurate default prediction has become a core issue in the field of credit risk control.This article takes the Home Credit Default Risk dataset on the Kaggle platform as the research object, and uses data mining methods to systematically analyze the statistical correlation between external credit scores, borrower annual income, loan amounts, and other key features and default risk.By using Information Value (IV) and Kernel Density Estimation (KDE) methods to screen high discriminative force variables, and based on data distribution characteristics, a prediction model with Random Forest and Extreme Gradient Boosting Tree (XGBoost) as the core is constructed to compare its performance under precision, recall, F1 value, and AUC indicators.The results show that XGBoost has better recognition ability in imbalanced data scenarios, while random forests have more advantages in feature interpretability.The research results not only verify the effectiveness of the feature distribution driven model design, but also provide practical suggestions and theoretical support for financial institutions in pre loan risk screening and dynamic monitoring during loans.

Keywords:

Credit risk prediction, Data driven, Random forest, XGBoost

Wang,K. (2025). Research on Data Driven Personal Credit Default Prediction: A Comparative Study of Random Forest and XGBoost Models. Advances in Economics, Management and Political Sciences,193,30-37.
Export citation

References

[1]. Thomas, L. C. (2000). A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. International Journal of Forecasting, 16(2), 149–172.

[2]. Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Wiley.

[3]. Crook, J. N., Edelman, D. B., & Thomas, L. C. (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research, 183(3), 1447–1465.

[4]. Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD 16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

[5]. Li Ming, Wang Yan, Zhao Qian (2021).Research on Network Credit Default Prediction Based on Machine Learning. Financial Research, (4), 88-96

[6]. Zhou Yong (2020). Optimization analysis of credit risk control model under the background of financial technology. Southern Finance, (10), 54-60

[7]. Smith, J. A., & Doe, R. L. (2018). Application of Random Forests in Financial Risk Prediction: A Comparative Study. Journal of Financial Analytics and Risk Management, *12*(3), 45–67.

[8]. Brown, A. R., Lee, C. T., & Zhang, H. (2019). Optimizing XGBoost for Imbalanced Credit Data: A Case Study on Dynamic Threshold Adjustment. In Proceedings of the 36th International Conference on Machine Learning (ICML) (pp. 123–135). PMLR.


Cite this article

Wang,K. (2025). Research on Data Driven Personal Credit Default Prediction: A Comparative Study of Random Forest and XGBoost Models. Advances in Economics, Management and Political Sciences,193,30-37.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of ICEMGD 2025 Symposium: Innovating in Management and Economic Development

ISBN:978-1-80590-201-0(Print) / 978-1-80590-202-7(Online)
Editor:Florian Marcel Nuţă Nuţă, Ahsan Ali Ashraf
Conference date: 23 September 2025
Series: Advances in Economics, Management and Political Sciences
Volume number: Vol.193
ISSN:2754-1169(Print) / 2754-1177(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[1]. Thomas, L. C. (2000). A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. International Journal of Forecasting, 16(2), 149–172.

[2]. Siddiqi, N. (2006). Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Wiley.

[3]. Crook, J. N., Edelman, D. B., & Thomas, L. C. (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research, 183(3), 1447–1465.

[4]. Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. KDD 16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

[5]. Li Ming, Wang Yan, Zhao Qian (2021).Research on Network Credit Default Prediction Based on Machine Learning. Financial Research, (4), 88-96

[6]. Zhou Yong (2020). Optimization analysis of credit risk control model under the background of financial technology. Southern Finance, (10), 54-60

[7]. Smith, J. A., & Doe, R. L. (2018). Application of Random Forests in Financial Risk Prediction: A Comparative Study. Journal of Financial Analytics and Risk Management, *12*(3), 45–67.

[8]. Brown, A. R., Lee, C. T., & Zhang, H. (2019). Optimizing XGBoost for Imbalanced Credit Data: A Case Study on Dynamic Threshold Adjustment. In Proceedings of the 36th International Conference on Machine Learning (ICML) (pp. 123–135). PMLR.