
Customer Churn Prediction Based on Multiple Linear Regression and Random Forest
- 1 School of Digital Economy, Guangdong University Of Finance & Economics, Foshan, 528000, China
* Author to whom correspondence should be addressed.
Abstract
Businesses are seeking to retain existing customers and reduce the cost of acquiring new customers. Therefore, customer churn rate prediction becomes an effective way to solve this problem. This study uses a dataset on telco customer churn to explore the application of multiple linear regression and random forest models in predicting customer churn. By analyzing various customer attributes, including service types, account details, and monthly fees, this paper aims to identify key factors contributing to churn. The random forest model outperformed multiple linear regression in terms of accuracy and stability, achieving an accuracy rate of 79.18% on the test set. However, the R^2 of the multiple linear regression is 0.275. The goodness of fit of the data set is low, but most of the 19 variables are statistically significant. Therefore, this study can further improve the prediction accuracy by changing the data set or combining hybrid models and deep learning technology. Our findings suggest that customer satisfaction, service usage, and total charges are significant factors in predicting customer churn. This paper can provide companies with valuable insights to improve customer retention, enhance customer experience, optimize customer relationships, reduce marketing costs, etc.
Keywords
Customer churn, multiple linear regression, random forest, predictive analytics.
[1]. Xue, X. (2023). Research on the influencing factors and countermeasures of corporate customer loyalty. Time-Honored Brand Marketing, 12, 132-134.
[2]. Cao, G., Yang, X., & Wang, R. (2024). Analysis of bank customer churn based on machine learning. Journal of Shandong Commercial Vocational and Technical College, 01, 105-110.
[3]. Yang, B., Wang, Z., Cheng, Z., Zhao, H., Wang, X., & Guan, Y. (2023). Customer churn prediction based on data reconstruction generated by diffusion model. Journal of Computer Research and Development, 61(2), 324.
[4]. Yang, B., Li, H., Xing, Y., et al. (2024). Directed search based on improved whale optimization algorithm for test case prioritization. International Journal of Computers Communications & Control. https://www.univagora.ro/jour/index.php/ijccc/article/view/5049
[5]. Ying, W., Qin, Z., Zhao, Y., et al. (2007). Support vector machine and its application in customer churn prediction. Systems Engineering - Theory & Practice, 27(7), 105−110. (in Chinese)
[6]. Ahn, J., Hwang, J., Kim, D., et al. (2020). A survey on churn analysis in various business domains. IEEE Access, 8, 220816−220839.
[7]. Ji, J. (2022). Establishment and evaluation of distance education student churn prediction model based on deep learning. Journal of Beijing Institute of Industrial Technology, 03, 21-26.
[8]. Wu, Z., Jing, L., Wu, B., et al. (2022). A PCA-AdaBoost model for e-commerce customer churn prediction. Annals of Operations Research, 1−18.
[9]. Prokhorenkova, L., Gusev, G., Vorobev, A., et al. (2018). CatBoost: Unbiased boosting with categorical features. In Advances in Neural Information Processing Systems (Vol. 31, pp. 6638−6648). New York: Curran Associates.
[10]. Pekel, O. E., & Ozcan, T. (2022). A novel deep learning model based on convolutional neural networks for employee churn prediction. Journal of Forecasting, 41(3), 539−550.
Cite this article
Deng,E. (2024). Customer Churn Prediction Based on Multiple Linear Regression and Random Forest. Applied and Computational Engineering,112,22-28.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 5th International Conference on Signal Processing and Machine Learning
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).