Research Article
Open access
Published on 27 August 2024
Download pdf
Liang,J. (2024). A review of the development of YOLO object detection algorithm. Applied and Computational Engineering,71,39-46.
Export citation

A review of the development of YOLO object detection algorithm

Junbiao Liang *,1,
  • 1 South China Normal University

* Author to whom correspondence should be addressed.

https://doi.org/10.54254/2755-2721/71/20241642

Abstract

The You Only Look Once (YOLO) algorithm series, as the forefront of object detection technology, has evolved from YOLOv1 to YOLOv10, consistently enhancing detection speed and accuracy. Through literature review and data analysis. This paper mainly discusses the development processes of the YOLO algorithm series, focuses on the changes and innovations in network structure, training strategies, and performance optimization. By introducing techniques, such as CSPNet, Anchor-free, data augmentation, and multi-scale training, the YOLO algorithm has progressively found a better balance between detection speed and accuracy, demonstrating excellent performance in processing real-time images and handling high-complexity scenarios. Furthermore, this paper also addresses some challenges faced by the YOLO algorithms and potential future research directions, such as the lower accuracy in detecting small targets and reduced robustness in complex scenarios. Through analysis, potential optimization directions for the future include further refining network structure and employing more efficient training methods to enhance algorithm efficiency. This paper intends to offer a thorough performance evaluation of the YOLO series algorithms, identify potential areas for future improvements to advance YOLO technology.

Keywords

YOLO algorithm, object detection, computer vision

[1]. Jiang, P., Ergu, D., Liu, F., Cai, Y. and Ma, B. (2022) A Review of Yolo Algorithm Developments. Procedia Computer Science, 199:1066–1073.

[2]. Al-batat, R., Angelopoulou, A., Premkumar, S., Hemanth, J. and Kapetanios, E. (2022) An End-to-End Automated License Plate Recognition System Using YOLO Based Vehicle and License Plate Detection with Vehicle Classification. SENSORS, 22:9477.

[3]. Bai, T. (2020) Analysis on Two-Stage Object Detection Based on Convolutional Neural Networkorks. 2020 INTERNATIONAL CONFERENCE ON BIG DATA & ARTIFICIAL INTELLIGENCE & SOFTWARE ENGINEERING (ICBASE 2020):321–325.

[4]. Zou, Z., Chen, K., Shi, Z., Guo, Y. and Ye, J. (2023) Object Detection in 20 Years: A Survey. PROCEEDINGS OF THE IEEE, 111:257–276.

[5]. Talaei Khoei, T., Ould Slimane, H. and Kaabouch, N. (2023) Deep Learning: Systematic Review, Models, Challenges, and Research Directions. NEURAL COMPUTING & APPLICATIONS, 35:23103–23124.

[6]. Agaian, S., Tourshan, K. and Noonan, J.P. (2003) Parameterisation of Slant-Haar Transforms. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 150:306–311.

[7]. Ng, P.C. and Henikoff, S. (2003) SIFT: Predicting Amino Acid Changes That Affect Protein Function. NUCLEIC ACIDS RESEARCH, 31:3812–3814.

[8]. Girshick, R., Donahue, J., Darrell, T. and Malik, J. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) :580–587.

[9]. Girshick, R. (2015) Fast R-CNN. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV):1440–1448.

[10]. He, K., Gkioxari, G., Dollar, P. and Girshick, R. (2020) Mask R-CNN. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 42:386–397.

[11]. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y. and Berg, A.C. (2016) SSD: Single Shot MultiBox Detector. COMPUTER VISION - ECCV 2016,pp.21–37.

[12]. Lin, T.Y., Goyal, P., Girshick, R., He, K. and Dollar, P. (2017) Focal Loss for Dense Object Detection. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), pp.2999–3007.

[13]. Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once: Unified, Real-Time Object Detection.

[14]. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A. (2015) Going Deeper with Convolutions. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), pp.1–9.

[15]. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K. and Li, F.-F. (2009) ImageNet: A Large-Scale Hierarchical Image Database. CVPR: 2009 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-4, pp.248–255.

[16]. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J. and Zisserman, A. (2010) The Pascal Visual Object Classes (VOC) Challenge. INTERNATIONAL JOURNAL OF COMPUTER VISION, 88:303–338.

[17]. Glorot, X., Bordes, A. and Bengio, Y. (2011) Deep Sparse Rectifier Neural Networks. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp.315–323.

[18]. Maas, A.L., Hannun, A.Y. and Ng, A.Y. Rectifier Nonlinearities Improve Neural Network Acoustic Models.

[19]. Redmon, J. and Farhadi, A. (2017) YOLO9000: Better, Faster, Stronger. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), pp.6517–6525.

[20]. Simonyan, K. and Zisserman, A. (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition.

[21]. Redmon, J. and Farhadi, A. YOLOv3: An Incremental Improvement. http://arxiv.org/abs/1804.02767

[22]. Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B. and Belongie, S. (2017) Feature Pyramid Networks for Object Detection. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), pp.936–944.

[23]. Bochkovskiy, A., Wang, C.Y. and Liao, H.Y.M. (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. http://arxiv.org/abs/2004.10934

[24]. He, K., Zhang, X., Ren, S. and Sun, J. (2014) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. COMPUTER VISION - ECCV 2014, PT III, pp.346–361.

[25]. Liu, S., Qi, L., Qin, H., Shi, J. and Jia, J. (2018) Path Aggregation Network for Instance Segmentation. http://arxiv.org/abs/1803.01534

[26]. Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W. and Yeh, I.H. (2020) CSPNet: A New Backbone That Can Enhance Learning Capability of CNN. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), pp.1571–1580.

[27]. Dubey, S.R., Singh, S.K. and Chaudhuri, B.B. (2022) Activation Functions in Deep Learning: A Comprehensive Survey and Benchmark. NEUROCOMPUTING, 503:92–108.

[28]. NELSON, J., SOLAWETZ, J. (2020) YOLOv5 Is Here: State-of-the-Art Object Detection at 140 FPS. Roboflow Blog. https://blog.roboflow.com/yolov5-is-here/

[29]. Shorten, C. and Khoshgoftaar, T.M. (2019) A Survey on Image Data Augmentation for Deep Learning. JOURNAL OF BIG DATA, 6:60.

[30]. Ge, Z., Liu, S., Wang, F., Li, Z. and Sun, J. (2021) YOLOX: Exceeding YOLO Series in 2021. http://arxiv.org/abs/2107.08430

[31]. Ge, Z., Liu, S., Li, Z., Yoshie, O. and Sun, J. (2021) OTA: Optimal Transport Assignment for Object Detection. http://arxiv.org/abs/2103.14259

[32]. Terven, J., Cordova-Esparza, D.M., Ramirez-Pedraza, A. and Chavez-Urbiola, E.A. (2023) Loss Functions and Metrics in Deep Learning. http://arxiv.org/abs/2307.02694

[33]. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P. and Zitnick, C.L. (2014) Microsoft COCO: Common Objects in Context. COMPUTER VISION - ECCV 2014, PT V, pp.740–755.

[34]. Li, C., Li, L., Jiang, H., Weng, K., et al. (2022) YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. http://arxiv.org/abs/2209.02976

[35]. Feng, C., Zhong, Y., Gao, Y., Scott, M.R. and Huang, W. (2021) TOOD: Task-Aligned One-Stage Object Detection. http://arxiv.org/abs/2108.07755

[36]. Wang, C.Y., Bochkovskiy, A. and Liao, H.Y.M. (2022) YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. http://arxiv.org/abs/2207.02696

[37]. Jocher, G., Chaurasia, A. and Qiu, J. (2023) Ultralytics YOLO. https://github.com/ultralytics/ultralytics

[38]. Zhang, X., Zeng, H., Guo, S. and Zhang, L. (2022) Efficient Long-Range Attention Network for Image Super-Resolution. http://arxiv.org/abs/2203.06697

[39]. Wang, C.Y., Yeh, I.H. and Liao, H.Y.M. (2024) YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. http://arxiv.org/abs/2403.13616

[40]. Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J. and Ding, G. (2024) YOLOv10: Real-Time End-to-End Object Detection. http://arxiv.org/abs/2405.14458

Cite this article

Liang,J. (2024). A review of the development of YOLO object detection algorithm. Applied and Computational Engineering,71,39-46.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 6th International Conference on Computing and Data Science

Conference website: https://www.confcds.org/
ISBN:978-1-83558-481-1(Print) / 978-1-83558-482-8(Online)
Conference date: 12 September 2024
Editor:Alan Wang, Roman Bauer
Series: Applied and Computational Engineering
Volume number: Vol.71
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).