
Design and implementation of efficient distributed deep learning model inference architecture on serverless computation
- 1 Nanjing University, Nanjing, China
- 2 Shenzhen University, Shenzhen, China
- 3 Zhengzhou University of Light Industry, Zhengzhou, China
* Author to whom correspondence should be addressed.
Abstract
Although the distributed algorithm and architecture have been widely used in the development of traditional large-scale machine model reasoning tasks, but it is unavoidable about the source waste caused by synchronization between learners and participants. However, serverless computing is charged based on usage, which makes it a popular alternative for model inference. This article mainly proposes the architecture and algorithm that can be used for the inference task of the distributed deep learning model and adopts coarse-grained parallelization strategy to solve the problem of high communication cost between Serverless platform functions. The paper tested the performance of the model on the deep learning model and found that the model reasoning based on the Serverless platform can better cope with the requirements and changes, and ensure the stability of the system. It is worth noting that both the data owner and the model owner can benefit from the process of carrying out the machine learning model inference task, but existing solutions cannot satisfied with them privacy requirements, which is a problem needs to think about in the field of machine learning inference tasksbased on the Serverless platform in the future.
Keywords
Serverless Computing, Machine Learning Inference, Parallel Computing
[1]. Gao, W., Hu, Q., Ye, Z. and et al. 2022. Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision. arXiv preprint, arXiv:2205.11913.
[2]. Zhang, C., Xia, J., Yang, B. and et al. 2021. Citadel: Protecting data privacy and model confidentiality for collaborative learning. Proceedings of the ACM Symposium on Cloud Computing, 546-561.
[3]. Yu, H., Li, J., Hua, Y. and et al. 2024. Cheaper and Faster: Distributed Deep Reinforcement Learning with Serverless Computing.
[4]. Pei, Q., Yuan, Y., Hu, H. and et al. 2023. AsyFunc: A high-performance and resource-efficient serverless inference system via asymmetric functions. Proceedings of the 2023 ACM Symposium on Cloud Computing. 324-340.
[5]. Chahal, D., Palepu, S., Mishra, M. and et al. 2020. SLA-aware workload scheduling using hybrid cloud services. Proceedings of the 1st Workshop on High Performance Serverless Computing. 1-4.
[6]. Oakley, J. and Ferhatosmanoglu, H. 2024. FSD-Inference: Fully Serverless Distributed Inference with Scalable Cloud Communication. arXiv preprint arXiv:2403.15195.
[7]. Yu, Minchen and et al. 2021. "Gillis: Serving large neural networks in serverless functions with automatic model partitioning." 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS).
[8]. Liu, Y., Jiang, B., Guo, T. and et al. 2022. Funcpipe: A pipelined serverless framework for fast and cost-efficient training of deep learning models. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 6(3): 1-30.
[9]. Fu, Y., Xue, L., Huang, Y. and et al. 2024. ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models. arXiv preprint arXiv:2401.14351.
[10]. Mittone, G., Malenza, G., Aldinucci, M. and et al. 2023. Distributed Edge Inference: an Experimental Study on Multiview Detection. Proceedings of the IEEE. ACM 16th International Conference on Utility and Cloud Computing. 1-6.
Cite this article
Guo,X.;Yang,Z.;Yi,J. (2024). Design and implementation of efficient distributed deep learning model inference architecture on serverless computation. Applied and Computational Engineering,68,338-346.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 6th International Conference on Computing and Data Science
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).