Research Article
Open access
Published on 31 July 2024
Download pdf
Guo,X.;Yang,Z.;Yi,J. (2024). Design and implementation of efficient distributed deep learning model inference architecture on serverless computation. Applied and Computational Engineering,68,338-346.
Export citation

Design and implementation of efficient distributed deep learning model inference architecture on serverless computation

Xiaoyang Guo *,1, Zhe Yang 2, Jie Yi 3
  • 1 Nanjing University, Nanjing, China
  • 2 Shenzhen University, Shenzhen, China
  • 3 Zhengzhou University of Light Industry, Zhengzhou, China

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/68/20241535

Abstract

Although the distributed algorithm and architecture have been widely used in the development of traditional large-scale machine model reasoning tasks, but it is unavoidable about the source waste caused by synchronization between learners and participants. However, serverless computing is charged based on usage, which makes it a popular alternative for model inference. This article mainly proposes the architecture and algorithm that can be used for the inference task of the distributed deep learning model and adopts coarse-grained parallelization strategy to solve the problem of high communication cost between Serverless platform functions. The paper tested the performance of the model on the deep learning model and found that the model reasoning based on the Serverless platform can better cope with the requirements and changes, and ensure the stability of the system. It is worth noting that both the data owner and the model owner can benefit from the process of carrying out the machine learning model inference task, but existing solutions cannot satisfied with them privacy requirements, which is a problem needs to think about in the field of machine learning inference tasksbased on the Serverless platform in the future.

Keywords

Serverless Computing, Machine Learning Inference, Parallel Computing

[1]. Gao, W., Hu, Q., Ye, Z. and et al. 2022. Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision. arXiv preprint, arXiv:2205.11913.

[2]. Zhang, C., Xia, J., Yang, B. and et al. 2021. Citadel: Protecting data privacy and model confidentiality for collaborative learning. Proceedings of the ACM Symposium on Cloud Computing, 546-561.

[3]. Yu, H., Li, J., Hua, Y. and et al. 2024. Cheaper and Faster: Distributed Deep Reinforcement Learning with Serverless Computing.

[4]. Pei, Q., Yuan, Y., Hu, H. and et al. 2023. AsyFunc: A high-performance and resource-efficient serverless inference system via asymmetric functions. Proceedings of the 2023 ACM Symposium on Cloud Computing. 324-340.

[5]. Chahal, D., Palepu, S., Mishra, M. and et al. 2020. SLA-aware workload scheduling using hybrid cloud services. Proceedings of the 1st Workshop on High Performance Serverless Computing. 1-4.

[6]. Oakley, J. and Ferhatosmanoglu, H. 2024. FSD-Inference: Fully Serverless Distributed Inference with Scalable Cloud Communication. arXiv preprint arXiv:2403.15195.

[7]. Yu, Minchen and et al. 2021. "Gillis: Serving large neural networks in serverless functions with automatic model partitioning." 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS).

[8]. Liu, Y., Jiang, B., Guo, T. and et al. 2022. Funcpipe: A pipelined serverless framework for fast and cost-efficient training of deep learning models. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 6(3): 1-30.

[9]. Fu, Y., Xue, L., Huang, Y. and et al. 2024. ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models. arXiv preprint arXiv:2401.14351.

[10]. Mittone, G., Malenza, G., Aldinucci, M. and et al. 2023. Distributed Edge Inference: an Experimental Study on Multiview Detection. Proceedings of the IEEE. ACM 16th International Conference on Utility and Cloud Computing. 1-6.

Cite this article

Guo,X.;Yang,Z.;Yi,J. (2024). Design and implementation of efficient distributed deep learning model inference architecture on serverless computation. Applied and Computational Engineering,68,338-346.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 6th International Conference on Computing and Data Science

Conference website: https://www.confcds.org/
ISBN:978-1-83558-457-6(Print) / 978-1-83558-458-3(Online)
Conference date: 12 September 2024
Editor:Alan Wang, Roman Bauer
Series: Applied and Computational Engineering
Volume number: Vol.68
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).