
Comparison the Performances for Distributed Machine Learning: Evidence from XGboost and DNN
- 1 College of Software Engineering, Sichuan University, Chengdu, China
* Author to whom correspondence should be addressed.
Abstract
As a matter of fact, distributed machine learning approaches are widely adopted to enhance the training speed. The purpose of this study is to compare the performance of distributed machine learning models, specifically XGBoost and Deep Neural Networks (DNNs), using a telecommunications customer dataset. The dataset consists of 320,000 samples and 25 features and the analysis focuses on model performance under different data heterogeneity conditions. According to the analysis, XGBoost achieves excellent performance, rapidly reaching a high AUC of 97.7% with a minimal number of iterations, proving its effectiveness on structured, small and medium-sized datasets. In contrast, DNN struggled with this dataset and failed to outperform XGBoost due to its low dimensionality and small size. The paper also discusses the main limitations, including the lack of model diversity, the low dimensionality of the dataset, and the problem of model interpretability, especially for DNN. The results suggest that XGBoost is more suited to small and structured datasets, whereas DNN excels at high dimensionality and complex datasets. Future research should focus on improving model diversity, tuning, and addressing interpretability challenges.
Keywords
Distributed machine learning, XGBoost, Deep Neural Networks (DNN), telecom dataset, model performance.
[1]. Jordan M I and Mitchell T M 2015 Machine learning: Trends perspectives and prospects Science vol 349(6245) pp 255-260
[2]. Galakatos A, Crotty A and Kraska T 2020 Distributed machine learning Proceedings of the VLDB Endowment vol 13(12) pp 2235-2248
[3]. Liu T Y, Chen W and Wang T 2020 Distributed machine learning: Foundations trends and practices Morgan & Claypool Publishers
[4]. Verbraeken J, Wolting M, Katzy J, Kloppenburg J, Verbelen T and Rellermeyer J S 2020 A survey on distributed machine learning ACM Computing Surveys vol 53(2) pp 1-33
[5]. Zhang Q and Li X 2013 A survey of methods for distributed machine learning Frontiers of Computer Science vol 2(1) pp 1-11
[6]. Muscinelli E, Shinde S S and Tarchi D 2021 Overview of distributed machine learning techniques for 6G networks Electronics vol 10(9) p 1035
[7]. Guo Y, Zhao R, Lai S, Fan L, Lei X and Karagiannidis G K 2020 Distributed machine learning for multiuser mobile edge computing systems IEEE Transactions on Communications vol 68(6) pp 3457-3469
[8]. Aminizadeh S, Heidari A, Toumaj S, Darbandi M, Navimipour N J, Rezaei M, Talebi S, Azad P and Unal M 2022 The applications of machine learning techniques in medical data processing based on distributed computing and the Internet of Things Journal of King Saud University - Computer and Information Sciences vol 34(6) pp 3359-3374
[9]. Xing E P, Ho Q, Dai W, Kim J K, Wei J, Lee S, Zheng X, Xie P, Kumar A and Yu Y 2015 Petuum: A new platform for distributed machine learning on big data Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp 1355-1364
[10]. Pugliese R, Regondi S and Marini R 2022 Machine learning-based approach: Global trends research directions and regulatory standpoints Journal of Computer Science and Technology vol 37(4) pp 759-774
[11]. Kairouz P, McMahan H B and Yang Q 2022 From distributed machine learning to federated learning: A survey Foundations and Trends® in Machine Learning vol 15(1) pp 1-198
[12]. Konečný J, McMahan H B, Ramage D and Richtárik P 2016 Federated optimization: Distributed machine learning for on-device intelligence Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) pp 1-18
[13]. Poncinelli Filho C, Marques E, Jr Chang V, dos Santos L, Bernardini F, Pires P F, Ochi L and Delicato F C 2023 A systematic literature review on distributed machine learning in edge computing IEEE Access vol 11 pp 100-120
Cite this article
Tang,L. (2024). Comparison the Performances for Distributed Machine Learning: Evidence from XGboost and DNN. Applied and Computational Engineering,103,191-197.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 2nd International Conference on Machine Learning and Automation
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).