The application of deep learning in autonomous driving

Tingyu Zhang

doi:10.54254/2755-2721/50/20241379

1. Introduction

The history of artificial intelligence can be traced back to the 1950s. In the summer of 1956, J. MeCarthy, then a young math assistant at Dartmouth and now a professor at Stanford, joined with M. L. Minsky, a young math and neuroscientist at Harvard and a professor at MIT, N. Rochester, the head of IBM's information research center, and C. E. Shannon, a mathematical researcher in the information department of Bell LABS, jointly initiated a two-month academic seminar at Dartmouth University to discuss machine intelligence. This is the formal proposal and formation stage of the concept of artificial intelligence. Since then, artificial intelligence has gone through multiple stages of development. Since the 1950s, Artificial Intelligence (AI) has undergone a transition from symbolism to connectionism to the rise of data-driven AI models in the early 2000s. Among them, deep learning is the most concerned and influential technology in the field of artificial intelligence in recent years, which has made major breakthroughs in image and speech recognition. So far, artificial intelligence has developed into a widely used discipline, which is used in many fields such as healthcare, finance, manufacturing, transportation, education, and has spawned many innovative technologies and applications, such as natural language processing, computer vision, and intelligent robots.

The main branch of AI includes machine learning: learning from data and making predictions or decisions without having to be explicitly programmed. Deep learning: The use of deep neural networks to simulate how humans learn. Natural language processing: involves the use of computers to understand and generate human language. Computer vision: involves enabling computers to interpret and understand images and videos. Speech recognition: involves the use of computers to convert human speech into text. Robotics: includes autonomous navigation (enabling the robot to move freely around the environment), physical interaction (enabling the robot to interact with objects), perception (enabling the robot to understand and interpret its surroundings), and more.

The main algorithm of artificial intelligence has a decision tree: it classifies according to some features, raises a question for each node, divides the data into two categories through judgment, and then continues to ask questions. These questions are learned from existing data, and when new data is added, the data can be divided into appropriate leaves based on the problem on the tree. Random forest: Randomly select data from the source data and form several subsets. Each subset generates a decision tree. The new data is put into the M trees, M classification results are obtained, counting to see which category has the greatest number of predictions, and this category is used as the final prediction result. Logistic regression: When the prediction goal is such a probability, the range needs to meet greater than or equal to 0, less than or equal to 1, this time the simple linear model can not do, because when the domain is not within a certain range, the range is also beyond the specified interval. There are many other algorithms.

AI algorithms are already incorporated into multiple specific aspects of autonomous driving, such as computer vision, sensor fusion, path planning and decision making, control algorithms, predicting the behavior of other traffic participants, map creation and updates, swarm learning, anomaly detection and system health monitoring, speech recognition and natural language processing, and more. This paper intends to conduct a comprehensive research on this field.

This paper consists of the following three parts: the first part investigates some classic methods of deep learning in automatic driving, the second part investigates different applications of deep learning in automatic driving and analyzes and discusses them, and the last part summarizes the whole paper.

2. Method

2.1. Deep neural network

Different types of neural networks in deep learning, such as Convolutional Neural Networks (CNNS) [1, 2] and Artificial Neural Networks (ANNs) [3, 4], are revolutionizing the way interacted. These different types of neural networks are at the heart of the deep learning revolution, powering applications such as drones, self-driving cars, and speech recognition.

Deep neural networks are artificial neural networks made up of multiple layers of interconnected neurons, each receiving input signals and producing output signals. Deep neural networks have strong computational power and pattern recognition ability, can handle complex nonlinear problems, and automatically extract features from a large amount of data. Its characteristics include parametric efficiency, and through multiple layers of neurons and connections, the network can extract various features from the data and abstract and combine them to achieve a deeper understanding of the data. In addition, deep neural networks require higher computational resources to train and learn, including larger memory requirements, more computational processors, and longer training times. Deep neural networks are more capable of representation, and through multiple layers of neurons and connections, they can represent and encode more complex patterns and laws. When dealing with complex problems, deep neural networks have significant advantages in image recognition, speech recognition and natural language processing. It also has better generalization performance and can automatically extract and abstract the features in the input data, resulting in better generalization performance.

2.2. Processing of visual perception system of autonomous vehicle based on deep learning

The research uses deep learning techniques to build models that automatically recognize and distinguish target objects, which can extract meaningful information from images and identify different objects. Using computer vision technology, a model that can identify target objects such as vehicles, pedestrians and obstacles is established. The model uses image processing and machine learning technology to automatically identify different types of target objects through the analysis and processing of image features. The above two models are combined to form a complete visual perception system for autonomous vehicles, which can realize the perception and recognition of the surrounding environment and provide basic data support for decision-making and control of autonomous vehicles. Both target detection and target recognition adopt deep learning technology, using a large number of labeled image and video data to train the neural network model. In this paper, a method based on the calibration transformation matrix of internal and external parameters of the sensor is also proposed [5], and the result of target detection is uniformly mapped to the coordinate system of the vehicle, improving the accuracy of the detection result.

2.3. Research on road target detection algorithm based on deep learning

This research uses deep learning technology to realize automatic detection and recognition of road objects in images. An automatic vehicle detection method based on convolutional neural network is proposed, which uses CNN to extract image features and traditional computer vision technology, such as sliding window method [6], to realize object detection. An improved particle swarm optimization (PSO) based network structure design method is proposed to improve the efficiency and accuracy of network training. Using multi-scale feature fusion technology, the features of different scales are fused to achieve more precise feature extraction and analysis. Experiments show that the proposed method is robust to road traffic scenarios in complex environments. This study provides an important theoretical and practical basis for the research and development of road object detection, and points out the future development direction. A road target detection algorithm based on deep learning is proposed, which uses CNN to extract image features and classify targets. Use sliding window method to search the target. The experimental results show that the proposed algorithm has better adaptability in complex scenarios, and can accurately classify and detect pedestrian or vehicle targets in traffic intersections. Finally, the proposed algorithm is verified by experiments and compared with other algorithms. The experimental results show that the algorithm has high accuracy and real-time performance, and can be applied to different road scenes and different weather conditions.

2.4. Research on pedestrian tracking and trajectory prediction on urban roads based on deep learning

This research uses deep learning technology to realize accurate identification and tracking of pedestrians through computer vision. A set of automatic pedestrian recognition systems is designed, which innovatively combines visual detection and trajectory prediction to improve the understanding of pedestrian behavior pattern. Infrared multi-person tracking technology is used to improve tracking accuracy and achieve accurate tracking effect. Aiming at the problem of pedestrian target detection and classification, a simple and efficient algorithm is designed to segment the region of interest quickly and accurately, which is the basis for subsequent processing. A pedestrian intent prediction technique is also proposed to predict whether a pedestrian will cross a vehicle or street. Traditional pedestrian detection algorithms require a large number of training samples, and the actual scene is dynamic or unknown, so a new pedestrian detection and classification technology is proposed to adapt to the analysis of pedestrian behavior in complex environments [7]. A deep neural network (such as Faster-RCNN) is used to train the images in the video stream, extract pedestrian features, and realize the recognition of pedestrian location, action state, identity and other information. By using infrared multi-person tracking technology, multi-feature fusion and trajectory prediction means, high precision positioning and trajectory prediction are realized. Traffic lights are also classified and designed. Comprehensive use of deep learning technology, pedestrian detection, trajectory prediction and intention prediction organic combination, to achieve a comprehensive understanding and prediction of pedestrian behavior on urban roads, to provide solutions for the safety and efficiency of autonomous driving.

2.5. Research on predictive tasks of autonomous vehicles based on explainable deep learning

The research uses deep learning techniques to predict and make decisions about the environment around autonomous vehicles to achieve more accurate and efficient results. Based on the data in the field of intelligent transportation, the research extracts the factors that affect the change of vehicle behavior and the relationship between these factors, and establishes an interpretable deep learning model for predicting vehicle behavior. The model builds a feature space about traffic state according to road information and other factors, trains the deep neural network to get the final vehicle behavior prediction network, and presents the prediction results using visual interpretation technology. The study also proposes a pedestrian behavior prediction model, which learns pedestrian behavior patterns in historical data, predicts pedestrian behavior in the future period of time, and uses situational graph technology to present the interaction between pedestrians and vehicles. Finally, this study presented with an autonomous driving decision-making model based on reinforcement learning. By applying reinforcement learning method to the decision model, it has the ability to deal with the traffic situation in different scenarios. The model can intelligently make corresponding driving decisions according to the current state of the autonomous vehicle and the surrounding environment information, thus improving driving efficiency and safety. In addition, this paper also conducted the preliminary design and development of the autonomous driving decision support system assisted by augmented reality. In short, Wang Qi proposed a solution based on deep learning technology, which improved the accuracy and reliability of autonomous vehicle prediction and decision making, and provided a reference for the future autonomous driving decision making system.

3. Applications and discussion

3.1. Object detection

Deep learning is widely used in autonomous driving object detection and can assist systems to recognize a variety of objects, such as vehicles, pedestrians, and traffic signs [8]. The research focuses on vehicle detection, attention mechanisms and distance estimation. Deep learning models are commonly used for vehicle detection, among which the more popular are convolutional neural networks (CNN) and Long short-term memory networks (LSTM) [9]. In recent years, deep learning has made significant progress in object detection, with algorithms such as YOLO, FasterR-CNN, and SSD improving detection efficiency. Novel neural network structures such as ResNet and attention mechanisms improve feature representation and model performance. However, deep learning is limited by image quality and lighting conditions, requiring high computational resources and hardware, and poor interpretability. In the future, it is necessary to continue to study the technology and application of deep learning in the object detection of automatic driving, improve the accuracy and stability of the algorithm, and explore efficient and interpretable model and algorithm design.

3.2. Avoid obstacle

Deep learning is widely used in automatic driving obstacle avoidance, which can assist the system to identify and predict obstacles on the road, and then take measures to avoid obstacles. The algorithm is designed and trained on the basis of CNNS, RNNS and other neural networks. For example, using CNN to determine the type and location of obstacles [10]; RNN predicts the future trajectory of obstacles and formulates obstacle avoidance strategies. Combined with sensor fusion technology, it improves accuracy and robustness. In recent years, the research of deep learning obstacle avoidance has developed rapidly, and measures such as improving the network structure and optimizing the training algorithm have significantly improved the model performance. However, deep learning also has some limitations and challenges when it comes to obstacle avoidance in autonomous driving. First, the training requires a large amount of data, and the acquisition and processing of these data will take a lot of time and resources. Second, the model has poor interpretability, and it is difficult to give the reasons and basis for specific decisions. It is difficult for drivers to believe and accept the obstacle avoidance strategies generated by deep learning algorithms. In the future, it is necessary to further explore the technology and application of deep learning in the obstacle avoidance of automatic driving, improve the accuracy and stability of the algorithm, and explore more efficient and interpretable model structure and algorithm design.

4. Conclusion

In the field of autonomous driving, the application of deep learning has made remarkable progress, injecting new vitality into the development of the field. With the development of artificial intelligence technology, autonomous driving will enter a new era. Trained with deep neural networks, autonomous vehicles can recognize and learn from road markings, obstacles, pedestrians, and other traffic participants, and make decisions and predictions in real time. Deep learning has already been successfully applied to driverless cars. Moreover, deep learning has been pivotal in advancing perception and decision-making algorithms for adaptable vehicle navigation across diverse driving environments and conditions, extending its applications to autonomous driving systems including traffic light management, lane departure detection, and driver assistance functions, as it continues to evolve. However, there are still many challenges to deep learning, including the scarcity of data, the limitations of generalization, and the lack of interpretability. There are many ways to improve the performance of deep networks, but there are many shortcomings. In order to further promote the application of deep learning in the field of autonomous driving, future research needs to devote more effort to solving these problems.

References

[1]. Arena P et al 2003 Image processing for medical diagnosis using CNN Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 497(1) 174-178

[2]. Qiu Y et al 2020 Improved denoising autoencoder for maritime image denoising and semantic segmentation of USV China Communications 17(3) 46-57

[3]. Torbati N et al 2014 An efficient neural network based method for medical image segmentation Computers in biology and medicine 44 76-87

[4]. Zhou Z H and Jiang Y 2003 Medical diagnosis with C4. 5 rule preceded by artificial neural network ensemble IEEE Transactions on information Technology in Biomedicine 7(1) 37-42

[5]. Liu Z 2023 Design and implementation of visual perception system for autonomous vehicles based on deep learning (in Chinese) Southeast University

[6]. Chun Kang 2023 Research on road object detection algorithm based on deep learning (in Chinese) Zhejiang University of Science and Technology

[7]. Shi H 2023 Research on Urban Road Pedestrian Tracking and Trajectory Prediction based on Deep Learning (in Chinese) Xi 'an University of Technology

[8]. Qi W 2023 Research on predictive tasks for autonomous vehicles based on Explainable deep learning (in Chinese) Jilin University

[9]. Mi G 2023 Research on Vehicle Detection and Ranging Model Based on Attention Mechanism (in Chinese) Nanjing University of Posts and Telecommunications

[10]. Sun W 2023 Rapid vehicle detection based on CNN and its application in automatic driving (in Chinese) Tianjin University

Cite this article

Zhang,T. (2024). The application of deep learning in autonomous driving. Applied and Computational Engineering,50,144-148.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 4th International Conference on Signal Processing and Machine Learning

ISBN：978-1-83558-345-6(Print) / 978-1-83558-346-3(Online)

Editor：Marwan Omar

Conference website: https://www.confspml.org/

Conference date: 15 January 2024

Series: Applied and Computational Engineering

Volume number: Vol.50

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).