Bank Customer Churn Prediction Based on Machine Learning Models

Research Article
Open access

Bank Customer Churn Prediction Based on Machine Learning Models

He Jin 1*
  • 1 Shenzhen University    
  • *corresponding author he.jin@audencia.com
AEMPS Vol.170
ISSN (Print): 2754-1177
ISSN (Online): 2754-1169
ISBN (Print): 978-1-80590-019-1
ISBN (Online): 978-1-80590-020-7

Abstract

In the highly competitive banking sector, customer churn significantly impacts profitability and market share. This study develops an effective machine learning framework for predicting customer churn, combining advanced algorithms with data preprocessing techniques. The proposed approach employs Random Forest as the primary classifier, integrated with SMOTE-ENN hybrid sampling to address class imbalance, and incorporates feature selection methods to identify key predictors. Experimental results demonstrate that this combination achieves superior performance in churn prediction compared to conventional methods. The analysis reveals that customer tenure, income level, transaction frequency, and service interaction patterns are among the most influential factors affecting churn behavior. These findings provide actionable insights for banks to implement targeted retention strategies, such as personalized engagement programs and proactive service interventions. The framework offers financial institutions a data-driven tool to enhance customer relationship management, optimize resource allocation for retention efforts, and ultimately improve business sustainability. By bridging the gap between predictive analytics and practical decision-making, this research contributes to both academic knowledge and banking industry practices in customer churn management. The study's methodology and findings can be extended to other financial services and subscription-based business models facing similar churn challenges.

Keywords:

Bank Customer Churn, Machine Learning, Random Forest

Jin,H. (2025). Bank Customer Churn Prediction Based on Machine Learning Models. Advances in Economics, Management and Political Sciences,170,38-48.
Export citation

References

[1]. Garcia, S., Luengo, J., & Herrera, F. (2019). Data preprocessing in data mining. Springer.

[2]. Lopez, P., & Gonzalez, A. (2020). Addressing class imbalance in financial datasets. Journal of Financial Data Science, 5(2), 123-140.

[3]. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. Springer.

[4]. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

[5]. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297.

[6]. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.

[7]. Zhang, Y., & Wang, Q. (2023). Adaptive Ensemble Learning for Imbalanced Financial Data. IEEE Transactions on Knowledge and Data Engineering, 35(4), 1456-1470.

[8]. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.

[9]. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794).


Cite this article

Jin,H. (2025). Bank Customer Churn Prediction Based on Machine Learning Models. Advances in Economics, Management and Political Sciences,170,38-48.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 9th International Conference on Economic Management and Green Development

ISBN:978-1-80590-019-1(Print) / 978-1-80590-020-7(Online)
Editor:Florian Marcel Nuţă
Conference website: https://2025.icemgd.org/
Conference date: 26 September 2025
Series: Advances in Economics, Management and Political Sciences
Volume number: Vol.170
ISSN:2754-1169(Print) / 2754-1177(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[1]. Garcia, S., Luengo, J., & Herrera, F. (2019). Data preprocessing in data mining. Springer.

[2]. Lopez, P., & Gonzalez, A. (2020). Addressing class imbalance in financial datasets. Journal of Financial Data Science, 5(2), 123-140.

[3]. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. Springer.

[4]. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

[5]. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297.

[6]. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.

[7]. Zhang, Y., & Wang, Q. (2023). Adaptive Ensemble Learning for Imbalanced Financial Data. IEEE Transactions on Knowledge and Data Engineering, 35(4), 1456-1470.

[8]. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.

[9]. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794).