
Neural Network Optimization Framework for NPU-MCU Heterogeneous Platforms
- 1 Harbin Engineering University, 145 Nantong Street, Nangang District, Harbin, China
* Author to whom correspondence should be addressed.
Abstract
With the widespread application of Deep Neural Networks (DNNs) in edge computing and embedded systems, edge devices face challenges such as limited computational resources and strict power constraints. Microcontroller Units (MCUs), combined with the low-cost and mass-production advantages of dedicated Neural Processing Units (NPUs), provide a more practical solution for edge AI. This paper proposes a neural network optimization framework for NPU-MCU heterogeneous computing platforms. By leveraging techniques such as algorithm partitioning, pipeline design, data flow optimization, and task scheduling, the framework fully exploits the computational advantages of NPUs and the control capabilities of MCUs, significantly improving the system's computational efficiency and energy efficiency. Specifically, the framework assigns compute-intensive tasks (e.g., convolution, matrix multiplication) to NPUs and control-intensive tasks (e.g., task scheduling, data preprocessing) to MCUs. Combined with pipeline design and data flow optimization, it maximizes hardware resource utilization, reduces power consumption, and alleviates memory bandwidth pressure. Experimental results demonstrate that the framework performs exceptionally well in edge computing and IoT devices, effectively addressing the challenges of deploying neural networks in resource-constrained scenarios. This research provides a systematic optimization method for the industrial application of edge intelligence, offering significant theoretical and practical value.
Keywords
Edge Computing, Embedded Systems, Deep Neural Networks, Computational Efficiency, Computational Efficiency
[1]. H. Ye, H. Jun, and D. Chen, "HIDA: A Hierarchical Dataflow Compiler for High-Level Synthesis," in _29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)_, 2024.
[2]. Y. Chen, T. Krishna, J. Emer, and V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," IEEE Journal of Solid-State Circuits, vol. 52, no. 1, 2017, pp. 127-138.
[3]. M. Horowitz, "1.1 Computing's Energy Problem (and what we can do about it)," in IEEE International Solid-State Circuits Conference (ISSCC), 2014, pp. 10-14.
[4]. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, "ImageNet: A Large-Scale Hierarchical Image Database," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248-255.
[5]. X. Zhang, X. Zhou, M. Lin, and J. Sun, "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices," in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6848-6856.
[6]. A. Howard, M. Sandler, G. Chu, L. Chen, B. Tan, M. Wang, et al., "Searching for MobileNetV3," in IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 1314-1324.
[7]. N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, et al., "In-Datacenter Performance Analysis of a Tensor Processing Unit," in ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), 2017, pp. 1-12.
[8]. V. Sze, Y. Chen, T. Yang, and J. S. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," Proceedings of the IEEE, vol. 105, no. 12, 2017, pp. 2295-2329.
[9]. C. Zhu, S. Han, H. Mao, and W. J. Dally, "Trained Ternary Quantization," in International Conference on Learning Representations (ICLR), 2017.
[10]. T. Zhang, S. Ye, K. Zhang, J. Tang, W. Wen, M. Fardad, and Y. Wang, "A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers," in European Conference on Computer Vision (ECCV), 2018, pp. 184-199.
[11]. S. Han, H. Mao, and W. J. Dally, "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding," in International Conference on Learning Representations (ICLR), 2016.
[12]. M. Tan and Q. V. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," in International Conference on Machine Learning (ICML), 2019, pp. 6105-6114.
[13]. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., "Attention Is All You Need," in Advances in Neural Information Processing Systems (NIPS), 2017, pp. 5998-6008.
[14]. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Advances in Neural Information Processing Systems (NIPS), 2012, pp. 1097-1105.
[15]. Y. LeCun, Y. Bengio, and G. Hinton, "Deep Learning," Nature, vol. 521, no. 7553, 2015, pp. 436-444.
Cite this article
Wang,P. (2025). Neural Network Optimization Framework for NPU-MCU Heterogeneous Platforms. Applied and Computational Engineering,145,43-50.
Data availability
The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.
Disclaimer/Publisher's Note
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
About volume
Volume title: Proceedings of the 3rd International Conference on Software Engineering and Machine Learning
© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and
conditions of the Creative Commons Attribution (CC BY) license. Authors who
publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons
Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this
series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published
version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial
publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and
during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See
Open access policy for details).