Research Article
Open access
Published on 22 March 2023
Download pdf
Jing,C. (2023). Data analysis and machine learning in the context of customer churn prediction. Applied and Computational Engineering,2,136-148.
Export citation

Data analysis and machine learning in the context of customer churn prediction

Changran Jing *,1,
  • 1 School of Naval Architecture, Ocean & Civil Engineering, Shanghai Jiao Tong Uni-versity, Shanghai, China

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/2/20220570

Abstract

Due to the fierce competition in the market, customers are often faced with multiple choices when choosing products and services. So many industries, including banking, are now facing the problem of how to address customer churn. At the same time, in order to improve the quality of service for users, banks and other institutions need to conduct in-depth research on the characteristics of customers. This paper provides solutions to the above two problems by using data analysis and mining technology and machine learning technology in artificial intelligence. The study provides an in-depth exploration of cus-tomer churn likelihood by analyzing customer behavior and characteristics. This study used data analysis methods such as Chi-square test, Mann Whitney U test, linear regres-sion, and machine learning methods such as logistic regression, random forest, and XGBoost. This research uses public datasets on Kaggle. This study uses data analysis techniques to provide recommendations on how the banking industry should improve service quality, and establishes 6 models with better performance to predict customer churn. In addition, this paper also uses a variety of evaluation indicators to compare the model performance, and selects the random forest model with high predictive ability as the most suitable model. In addition, the order of importance of the factors responsible for customer churn was successfully derived.

Keywords

customer churn in bank, hypothetical test, linear regression, logistic regression, random tree, naive Bayes, XGBoost, LightGBM, CatBoost.

[1]. Kim M K, Park M C, Jeong D H. The effects of customer satisfaction and switching barrier on customer loyalty in Korean mobile telecommunication services[J]. Telecommunications policy, 2004, 28(2): 145-159.

[2]. Mihelis G, Grigoroudis E, Siskos Y, et al. Customer satisfaction measurement in the private bank sector[J]. European Journal of Operational Research, 2001, 130(2): 347-360.

[3]. He B, Shi Y, Wan Q, et al. Prediction of customer attrition of commercial banks based on SVM model[J]. Procedia computer science, 2014, 31: 423-430.

[4]. Vafeiadis T, Diamantaras K I, Sarigiannidis G, et al. A comparison of machine learning techniques for customer churn prediction[J]. Simulation Modelling Practice and Theory, 2015, 55: 1-9.

[5]. Ahmad A K, Jafar A, Aljoumaa K. Customer churn prediction in telecom using machine learning in big data platform[J]. Journal of Big Data, 2019, 6(1): 1-24.

[6]. Zakrzewska D, Murlewski J. Clustering algorithms for bank customer segmentation[C]//5th International Conference on Intelligent Systems Design and Applications (ISDA'05). IEEE, 2005: 197-202.

[7]. Anil Kumar D, Ravi V. Predicting credit card customer churn in banks using data mining[J]. International Journal of Data Analysis Techniques and Strategies, 2008, 1(1): 4-28.

[8]. Xie Y, Li X, Ngai E W T, et al. Customer churn prediction using improved balanced random forests[J]. Expert Systems with Applications, 2009, 36(3): 5445-5449.

[9]. Coussement K, Van den Poel D. Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques[J]. Expert systems with applications, 2008, 34(1): 313-327.

[10]. Rahman M, Kumar V. Machine learning based customer churn prediction in banking[C]//2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE, 2020: 1196-1201.

Cite this article

Jing,C. (2023). Data analysis and machine learning in the context of customer churn prediction. Applied and Computational Engineering,2,136-148.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 4th International Conference on Computing and Data Science (CONF-CDS 2022)

Conference website: https://www.confcds.org/
ISBN:978-1-915371-19-5(Print) / 978-1-915371-20-1(Online)
Conference date: 16 July 2022
Editor:Alan Wang
Series: Applied and Computational Engineering
Volume number: Vol.2
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).