Research Article
Open access
Published on 8 November 2024
Download pdf
Tang,L. (2024). Comparison the Performances for Distributed Machine Learning: Evidence from XGboost and DNN. Applied and Computational Engineering,103,191-197.
Export citation

Comparison the Performances for Distributed Machine Learning: Evidence from XGboost and DNN

Leqi Tang *,1,
  • 1 College of Software Engineering, Sichuan University, Chengdu, China

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/103/20241196

Abstract

As a matter of fact, distributed machine learning approaches are widely adopted to enhance the training speed. The purpose of this study is to compare the performance of distributed machine learning models, specifically XGBoost and Deep Neural Networks (DNNs), using a telecommunications customer dataset. The dataset consists of 320,000 samples and 25 features and the analysis focuses on model performance under different data heterogeneity conditions. According to the analysis, XGBoost achieves excellent performance, rapidly reaching a high AUC of 97.7% with a minimal number of iterations, proving its effectiveness on structured, small and medium-sized datasets. In contrast, DNN struggled with this dataset and failed to outperform XGBoost due to its low dimensionality and small size. The paper also discusses the main limitations, including the lack of model diversity, the low dimensionality of the dataset, and the problem of model interpretability, especially for DNN. The results suggest that XGBoost is more suited to small and structured datasets, whereas DNN excels at high dimensionality and complex datasets. Future research should focus on improving model diversity, tuning, and addressing interpretability challenges.

Keywords

Distributed machine learning, XGBoost, Deep Neural Networks (DNN), telecom dataset, model performance.

[1]. Jordan M I and Mitchell T M 2015 Machine learning: Trends perspectives and prospects Science vol 349(6245) pp 255-260

[2]. Galakatos A, Crotty A and Kraska T 2020 Distributed machine learning Proceedings of the VLDB Endowment vol 13(12) pp 2235-2248

[3]. Liu T Y, Chen W and Wang T 2020 Distributed machine learning: Foundations trends and practices Morgan & Claypool Publishers

[4]. Verbraeken J, Wolting M, Katzy J, Kloppenburg J, Verbelen T and Rellermeyer J S 2020 A survey on distributed machine learning ACM Computing Surveys vol 53(2) pp 1-33

[5]. Zhang Q and Li X 2013 A survey of methods for distributed machine learning Frontiers of Computer Science vol 2(1) pp 1-11

[6]. Muscinelli E, Shinde S S and Tarchi D 2021 Overview of distributed machine learning techniques for 6G networks Electronics vol 10(9) p 1035

[7]. Guo Y, Zhao R, Lai S, Fan L, Lei X and Karagiannidis G K 2020 Distributed machine learning for multiuser mobile edge computing systems IEEE Transactions on Communications vol 68(6) pp 3457-3469

[8]. Aminizadeh S, Heidari A, Toumaj S, Darbandi M, Navimipour N J, Rezaei M, Talebi S, Azad P and Unal M 2022 The applications of machine learning techniques in medical data processing based on distributed computing and the Internet of Things Journal of King Saud University - Computer and Information Sciences vol 34(6) pp 3359-3374

[9]. Xing E P, Ho Q, Dai W, Kim J K, Wei J, Lee S, Zheng X, Xie P, Kumar A and Yu Y 2015 Petuum: A new platform for distributed machine learning on big data Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp 1355-1364

[10]. Pugliese R, Regondi S and Marini R 2022 Machine learning-based approach: Global trends research directions and regulatory standpoints Journal of Computer Science and Technology vol 37(4) pp 759-774

[11]. Kairouz P, McMahan H B and Yang Q 2022 From distributed machine learning to federated learning: A survey Foundations and Trends® in Machine Learning vol 15(1) pp 1-198

[12]. Konečný J, McMahan H B, Ramage D and Richtárik P 2016 Federated optimization: Distributed machine learning for on-device intelligence Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) pp 1-18

[13]. Poncinelli Filho C, Marques E, Jr Chang V, dos Santos L, Bernardini F, Pires P F, Ochi L and Delicato F C 2023 A systematic literature review on distributed machine learning in edge computing IEEE Access vol 11 pp 100-120

Cite this article

Tang,L. (2024). Comparison the Performances for Distributed Machine Learning: Evidence from XGboost and DNN. Applied and Computational Engineering,103,191-197.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 2nd International Conference on Machine Learning and Automation

Conference website: https://2024.confmla.org/
ISBN:978-1-83558-695-2(Print) / 978-1-83558-696-9(Online)
Conference date: 12 January 2025
Editor:Mustafa ISTANBULLU
Series: Applied and Computational Engineering
Volume number: Vol.103
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).