A comprehensive review of models for vehicle detection based on computer vision analysis in autonomous vehicle

Yunxuan Mo

doi:10.54254/2755-2721/88/20241614

1. Introduction

A study shows that self-driving cars can eliminate 94% of traffic accidents caused by driver distraction or operational errors [1]. In addition, self-driving systems can help prevent vehicle component failures, reduce emissions, and provide convenience for people with disabilities to drive [2]. Consequently, the future of automobile design and driving development will be steered towards self-driving.

Autonomous driving system design usually includes environment perception, behavior decision-making, motion planning and control [3]. The ability to perceive the environment is the basis of autonomous driving [4], requiring the driving system to be able to identify entities such as surrounding vehicles, pedestrians, or traffic signs.

Statistics show that the main threat to drivers often comes from surrounding vehicles [5]. Vehicle detection technology reduces this risk by accurately and efficiently detecting surrounding vehicles while driving, which is critical to ensuring the safety of drivers and passengers [6].

At present, the development challenges of vehicle detection technology mainly caused by poor information processing speed. Because of the suddenness of accidents, vehicle detection technology requires faster processing speed than other applications, which makes it more complex [7]. The introduction of deep learning technology can effectively solve this problem [8, 9].

Section 2 introduces the methodology; Section 3 introduces the objectives, common datasets, and details vehicle detection algorithms using image analysis, machine learning, and deep learning; Section 4 evaluates the strengths and weaknesses of the models; Section 5 outlines the future of vehicle detection technology; and Section 6 concludes.

2. Methodology

2.1. Literature Collection

To comprehensively cover the latest advancements and research in vehicle detection technology, we adopted a systematic literature collection method to ensure that the selected literature is representative and of high academic value.

(1) Database Selection: We primarily retrieved relevant literature from the following databases: IEEE Xplore, SpringerLink, ScienceDirect, ACM Digital Library, and Google Scholar. These databases contain many high-quality academic papers and conference papers in the field of computer vision and autonomous driving, which are the theoretical basis of this article.

(2) Search Keywords: We used multiple keywords and their combinations for retrieval, including but not limited to "vehicle identification", "autonomous driving", "computer vision", "deep learning", "machine learning", "object Identification", "semantic segmentation", "instance segmentation", and variations of these keywords.

(3) Time Range: To ensure coverage of the latest research results, we focused mainly on literature published after 2010, but also included some important early studies to provide background and historical perspective.

(4) Types of Literature: We selected journal papers, conference papers, review articles, and some highly cited and influential doctoral dissertations and technical reports.

2.2. Literature Screening

After initially searching a large amount of literature, we conducted a two-stage screening to ensure that we obtained literature that was close to the research topic and had sufficient academic value.

(1) Initial Screening: Preliminary screening was conducted through titles and abstracts to exclude literature that is not related to vehicle identification. The initial screening criteria included the subject of the document, research methods, and application scenarios.

(2) Detailed Screening: Full-text reading was performed on the documents that passed the initial screening, and further screening was conducted based on the relevance and innovation of the research content, methods, and results. Detailed screening criteria included the innovation of research methods, rigor of experimental design, reliability of results, and academic influence of the literature.

2.3. Classification and Organization

To systematically summarize and compare different vehicle detection technologies, the screened literature was classified and organized according to technical types and application scenarios.

(1) Traditional Image Processing Technology: Includes vehicle detection methods based on features such as color, symmetry, contour, texture, shadow, and taillights. After classification, the basic principles, implementation steps, application scenarios, and advantages and disadvantages of these methods were analyzed.

(2) Machine Learning Technology: Includes feature extraction methods such as HOG, LBP, Haar-like, and classifiers such as SVM, AdaBoost, and KNN. After classification, the performance of different feature extraction methods and classifiers was compared, and their performance and limitations in practical applications were discussed.

(3) Deep Learning Technology: Includes object Identification, semantic segmentation, and instance segmentation methods based on convolutional neural networks (CNN). Detailed descriptions of the architectures, training methods, datasets, and application effects of different deep learning models in vehicle detection tasks were provided.

2.4. Model Evaluation and Analysis

We perform quantitative and qualitative analysis of experimental results reported in the literature to objectively compare the performance of different technical approaches.

2.4.1. Quantitative Analysis

The experimental results of different technical methods on public datasets are sorted out and compared using the following indicators:

(1) Precision (P): The proportion of correctly detected vehicles in the detection task to actual vehicles.

(2) Recall (R): The proportion of actual vehicles that are correctly detected.

(3) F1 score: The harmonic mean of precision and recall, which comprehensively evaluates the detection performance of the model.

(4) Accuracy (AP): The proportion of correct predictions by the model, suitable for classification tasks.

(5) Mean average accuracy (mAP): The average detection accuracy of all categories, which comprehensively evaluates the prediction performance of the model.

(6) Intersection over Union (IoU): Evaluates the accuracy of bounding box prediction.

(7) Frame rate (FPS): The number of image frames that the model can process per second, which evaluates the recognition speed of the model.

(8) Floating point operations (FLOP): A quantitative indicator of model complexity. The smaller the FLOP value, the smaller the computational burden.

The calculation formulas for the above parameters are as follows:

\( P=\frac{TP}{TP+FP} \)

\( R=\frac{TP}{TP+FN} \)

\( F1=\frac{2×P×R}{P+R} \)

\( AP=\int _{0}^{1}P(R)dR \)

\( mAP=\frac{1}{n}\sum _{i=1}^{n}P(i)∆R(i) \)

\( IoU=\frac{{P_{b}}∩{G_{b}}}{{P_{b}}∪{G_{b}}} \)

Among them, true positives (TP) are the number of samples correctly identified as positive by the model; false positives (FP) are the number of negative samples mistakenly identified as positive by the model; false negatives (FN) are the number of positive samples mistakenly identified as negative by the model; in the mAP formula, n is the number of identified target categories; in the IoU formula, P_b is the predicted box and G_b is the real box.

2.4.2. Qualitative Analysis

Analyzed the advantages and disadvantages of different technical methods, including model complexity, computational cost, environmental adaptability, and performance and limitations in practical applications.

2.5. Future Directions

Based on the current technical challenges and research trends presented in the literature, several potential directions for future research are proposed.

3. Vehicle Detection Model Based on Computer Vision Analysis Technology

3.1. Introduction of Vehicle Detection Target and Datasets

3.1.1. Vehicle Detection Target

Vehicle detection algorithms require real-time detection and analysis of multiple targets, so setting targets with universal detection significance during model design can improve model operation efficiency. Common vehicle detection system targets mainly include:

(1) Vehicle positioning: Locating the position of surrounding vehicles in an image or video

(2) Vehicle classification: Determine the type of vehicle in an image, such as a car, truck, bus, etc.

(3) Vehicle tracking: Track the position and trajectory of vehicles in a video sequence.

(4) License plate detection: Identify and read the license plate number of a vehicle.

3.1.2. Selection of Datasets

In the review, we selected some commonly used public datasets for detailed discussion. These datasets are widely used in vehicle detection research, have high authority, and cover different scenarios and conditions, which are helpful for comprehensive evaluation and comparison of the performance of different vehicle detection technologies. Table 1 summarizes some key data in these datasets, such as year, location, category, 3D boxes, annotation, scenario, and application scenario, where 3Db. represents 3D box, Cl. represents category, Sc. represents scenario, and An. represents annotation.

Table 1. Commonly Used Public Vehicle Detection Datasets

Dataset	Year	Loc.	Sc.	Cl.	An.	3Db.	Application Scenarios
KITTI	2012	Karlsruhe (DE)	22	8	15 k	200 k	Applicable to a variety of application scenarios, providing rich annotations and diverse environmental conditions [10].
Cityscapes	2016	50 cities	-	30	25 k	-	Mainly oriented to segmentation tasks, suitable for image segmentation of urban road scenes [11].
BDD100K	2018	San Francisco and New York (US)	100 k	10	100 k	-	Contains a large amount of data, suitable for large-scale data processing and analysis, especially computer vision tasks related to autonomous driving [12].
Waymo open	2019	6 cities in US	1 k	4	200 k	12 M	Focuses on computer vision tasks, data covers all-weather conditions, and is applicable to a variety of complex scenarios [13].
nuScenes	2019	Boston (US), Singapore	1 k	23	40 k	1.4 M	Data collected in high-density traffic and extremely challenging driving situations, suitable for detection and tracking tasks in autonomous driving [14].
CADC	2020	Waterloo (CA)	75	10	7 k	-	Focuses on snow driving data, suitable for driving scene research in severe weather conditions [15].
RADIATE	2021	UK	7	8	-	-	Focuses on tracking and scene understanding in severe weather conditions using radar sensors [16].
SHIFT	2022	8 cities	-	23	2.5 M	2.5 M	Synthetic driving dataset, suitable for continuous multi-task domain adaptation research [17].
Argo verse 2	2023	6 cities in US	250 k	30	-	-	The latest large-scale LiDAR sensor dataset, suitable for 3D tracking tasks and the development of advanced autonomous driving systems [18].

3.2. Traditional-Based Methods for Vehicle Identification

Traditional vehicle identification technology is usually divided into two stages: hypothesis generation (HG) and hypothesis verification (HV). First, in the HG stage, the system determines the processing region (ROI) by analyzing the vehicle image features. Then, in the HV stage, the system determines whether the target vehicle is within the ROI. In short, HG is the basis, and HV is further verification, and the two complement each other. The following are some commonly used vehicle identification features:

(1) Color: By setting an appropriate segmentation threshold based on the consistency and concentration of colors in the image, the vehicle can be isolated from the background [19, 20]. However, color feature-based technology is easily affected by light changes and mirror reflections [21].

(2) Symmetry: The symmetrical structural features of the vehicle's rear end help to reflect the vehicle model in the image ROI, which can not only optimize the vehicle boundary, but also be used in the HV stage to verify whether the ROI contains the target vehicle. However, symmetry retrieval will increase the recognition time [22].

(3) Contour: Vehicle geometry features extracted from the image (such as body shape, bumper, rear window, and license plate) can further determine the vehicle's contour. However, in some scenes, these edge lines may overlap with some lines in the background, resulting in false positives [23, 24].

(4) Texture: The texture distribution on the road surface is usually uniform, while the texture distribution on the vehicle surface tends to be uneven. Vehicles can be detected indirectly by distinguishing between these two situations, but relying solely on texture features to identify vehicles may result in low accuracy [25].

(5) Shadow: In bright daylight, the shadow under the vehicle on the road can be extracted as the vehicle's ROI using a segmentation threshold, but in the machine recognition process, this area cannot form a clear boundary with the road surface, which may result in low accuracy or even false positives, so its application scenarios are limited [26, 27].

(6) Taillights: The taillights of vehicles at night are red, and this information is relatively easy to extract through image processing technology against a dark background. However, this feature is only effective at night [28, 29].

Traditional vehicle detection technologies are low-cost and simple in principle, but these methods are usually based on empirical theories and are easily affected by environmental interference.

3.3. Vehicle Detection Based on Machine Learning

The fundamental concept of ML technology is to utilize data and models to imitate human learning techniques.ML models, when applied to vehicle identification, process and encode images of vehicles through pre-crafted features such as color, contour, symmetry, and grayscale, transforming data from a high-dimensional image space to a low-dimensional one. This process includes the processing, encoding, and continuous training of vehicle images, and finally generates a model that can be used for vehicle identification.

Vehicle recognition based on machine learning technology is mainly divided into two stages: first, extracting the features of the input image; then inputting the extracted features into the classifier for training and optimization. Through continuous optimization, these models can effectively distinguish and classify various vehicles.

3.3.1. Feature Extractor

An effective feature extraction technique must be able to easily extract and find features when the vehicle poses and type change while maintaining the consistency of vehicle features.

Widely employed for feature extraction in object detection applications, the Histogram of Oriented Gradients (HOG) is a popular choice. Subsequently, many researchers have further developed this model, such as dual HOG vectors [30], HOG pyramids [31], and symmetric HOG [32].

Other feature extraction methods commonly used for vehicle detection include Haar-like vectors [33], local binary patterns (LBP) [34], Gabor filters [35], and speeded up robust features (SURF) [36].

3.3.2. Classifiers

ML classifiers can distinguish between vehicles and non-vehicle objects based on specific features extracted from images. Typically, the model must be trained using an accurately labeled dataset to distinguish between positive and negative examples. Classifiers for vehicle detection most employed include AdaBoost, K-nearest neighbor (KNN), Naive Bayes (NB), support vector machine (SVM), and decision tree (DT).

When choosing a classifier, one must strike a balance between generalization, which measures how well a model can adapt to new data, and fit accuracy, which measures how well a classifier can accurately identify patterns and information in the training data.The classic machine learning technique of ensemble learning unites the forecasts of multiple base classifiers to augment the overall prediction capability [37, 38].

Vehicle detection based on machine learning requires scanning the entire image to obtain features, but this increases the computational cost and time because most areas do not have vehicle features [39]. Combining traditional feature extraction methods with classifiers has successfully addressed this challenge.

Table 2 lists several studies on the application of feature engineering and classifiers in vehicle recognition.

Table 2. Various Research Works Focusing on Feature Extractor and Classifiers in The Context of Vehicle Identification.

Feature extractor	Classifier	Dataset	Accuracy
HOG	Adaboost	GTI vehicle database and real traffic scene videos	98.82%	[40]
HOG	GA-SVM	1648 vehicles and 1646 non-vehicles	97.76%	[41]
HOG	SVM	420 road images from real on-road driving tests	93.00%	[42]
HOG	SVM	GTI vehicle database and another 400 images from real traffic scenes	93.75%	[43]
Haar-like	Adaboost	Hand-labeled data of 10,000 positive and 15,000 negative examples	-	[44]
SURF	SVM	2846 vehicles from 29 vehicle makes and models	99.07%	[45]
PCA	SVM	1051 vehicle images and 1051 nonvehicle images	96.11%	[46]
SIFT	SVM	880 positive samples and 800 negative samples	-	[47]

3.4. Deep Learning-Based Methods for Vehicle Identification

Machine learning models that rely on preset feature extractors and classifiers limit the model’s computable data ability to a certain extent. Deep learning, especially convolutional neural networks (CNNs), can effectively solve this problem [48].

Figure 1 intuitively shows the differences and relevance of four common deep learning model-based vehicle detection technologies: (a) object classification, (b) object Identification, (c) semantic segmentation, and (d) instance segmentation. Object classification models can find and label various entity categories in an image; object detection goes a step further and can locate the relative positions of objects of each category through bounding boxes. Semantic segmentation labels entities of different categories after grayscale processing of the image; instance segmentation goes a step further and directly distinguishes different object boundaries in the image [49].

/word/media/image1.jpeg

Figure 1. Relationship And Comparison Between Different Vehicle Detection Algorithms

3.4.1. Object Identification-based Methods

Generally speaking, object detection models can be divided into anchor-based, anchor-free, and end-to-end recognizers, as shown in Table 3. Figure 2 shows the applications of these three recognizers.

/word/media/image2.png

Figure 2. The Real-World Application of Different Identifiers

Table 3. Three Object Identification-Based Types, The Identifiers of The Model

model	definition	sub model	definition	examples	Explain
Anchor-Based Recognizers	By comparing the object bounding box with predefined anchor boxes in the image, the location and category of the object can be predicted.	Two-Stage Recognizers	Extract regional features, then classify and refine them to identify the target. More accurate but slower.	(1) R-CNN series [50-52] (2) FPN [53] (3) SPP-Net [54] (4) R-FCN [55]	Faster R-CNN [50] is used by adding a separate regional proposal network to the traditional R-CNN model to reduce the time required for detection.
Anchor-Based Recognizers		One-Stage Recognizers	Directly predict object locations and classes from feature maps. Faster but generally less accurate.	(1) SSD [56] (2) RetinaNet [57] (3) YOLO series (YOLOv1 to YOLOv5) [58-61]	YOLOv1 is the foundation of the YOLO series. Subsequent YOLO models (from YOLOv2 to YOLOv5) are continuously optimized based on anchor design, for example: (1) YOLOv4: strives to achieve an ideal balance between detection speed and accuracy. (2) YOLOv5: optimized for performance on mobile devices.
Anchor-Free Recognizers	Make predictions based on the center point or key points of the object. This is usually more computationally efficient.	Key Point Based Models	Detect key points to form bounding boxes.	(1) CornerNet [62] (2) Repoints [63] (3) CenterNet [64] (4) ExtremeNet [65]	Corner Net defines the boundary of an object by identifying a pair of key points. Center Net uses three sets of key points to define the boundary to improve detection accuracy and memory.
Anchor-Free Recognizers		Center-Based Models	Predict the center point of the object and its relationship to the bounding box.	(1) GA-RPN [66] (2) FSAF [67] (3) Fovea Box [68] (4) YOLOv9 [69]	(1) The GA-RPN algorithm classifies pixels in the center area of an object as positive examples, and then predicts the object position, width, and height based on Faster R-CNN. (2) YOLOv9 introduces generalized ELAN based on YOLOv7 (YOLOv7: uses the Efficient Layer Aggregation Network (ELAN) as the basic framework and uses a large number of parameterized convolutions to improve inference speed [70]) and supplies programmable gradient information for custom network structures, further improving recognition efficiency. YOLOv9 is expected to soon become the industry standard for anchor-free recognizers.
End-to-End Recognizers	Directly analyze input images without complex pre- or post-processing.	traditional convolutional networks- based	-	(1) DeFCN [71] (2) Sparse R-CNN [72]	(1) DeFCN is based on the concept of FOCS and combines the corresponding "prediction-perception" labels for classification. (2) Sparse R-CNN is trained with a set of predetermined features and then performs object recognition and classification on samples.
End-to-End Recognizers		DETR neural network	A Transformer-based neural network that uses a self-attention mechanism for encoding and decoding to achieve end-to-end recognition and model global feature information [73].	(1) Deformable DETR [74] (2) Anchor-DETR [75] (3) RT-DETR [76]	The encoder-decoder architecture can encode image features into high-dimensional vectors and then decode them into vehicle categories and locations, while DETR uses Transformer to integrate object recognition tasks into this process.

3.4.2. Segmentation-Based Methods

Segmentation-based deep learning algorithms are divided into semantic segmentation and instance segmentation, see Table 4 for details.

Table 4. Comparison Between Semantic Segmentation and Instance Segmentation

characteristic	semantic segmentation	instance segmentation
Objectives	Classify each pixel in the image and distinguish between pixels of various categories [77].	Detect and describe each object instance in the image, distinguishing different instances of the same category [78].
Precision and accuracy	Supplies high precision and accuracy [77, 78].
Information provision	Supplies detailed information on the vehicle's location and shape [77, 78].
Importance in autonomous driving	It is a critical component of autonomous driving environment perception [77, 78].
Model types	Fully supervised and weakly supervised models [79].
Characteristics of weakly supervised models	Use incomplete, inaccurate or mislabeled data for training at low cost and with less labeled data required [79].
Disadvantages of weakly supervised models	Affected by noise or incorrect labeling, the detection accuracy is low, which may seriously affect the performance and safety of autonomous driving [79].
Priorities of fully supervised models	Due to the lack of security of weakly supervised models, fully supervised models are usually used in most cases [79].
Summary	Good at classifying and distinguishing pixels of different categories, providing detailed location information, and is a key technology for autonomous driving.	Good at accurately identifying and describing individual vehicle instances. In safety-critical applications, fully supervised models are often preferred to ensure reliability and accuracy.

Table 5 shows some other more advanced segmentation-based deep learning algorithms for vehicle recognition.

Table 5. Other Segmentation-Based Deep Learning Algorithms

Model	Definition	Examples	Detailed description
Fully Convolutional Networks (FCNs)	In 2015, the fully connected layer was replaced with a convolutional layer for the first time, and a jump architecture was used to integrate feature data [80].	(1) SegNet (2) DeepLab Series	(1) SegNet: Based on an encoder-decoder system, the encoder's low-resolution representation is mapped to the full input resolution feature map [81]. (2) DeepLabv1: Combining CRF model and dilated convolution technology to extract image information [82]. (3) DeepLabv2: Integrates the Resnet [83] backbone and the expanded spatial pyramid pooling (ASPP) module [84]. (4) DeepLabv3: Combining the ideas of DeepLabv1 and DeepLabv2, it can segment objects of different scales [85]. (5) DeepLabv3+: Based on Xception [86],it uses depth wise separable convolution to replace convolutional layers and pooling layers [87].
RefineNet	Prevent image resolution loss by combining high-level features with more refined low-level components.	RefineNet	Combining high-level features with more refined low-level components to prevent image resolution degradation [88].
PSPNet	A pyramid pooling module is proposed to mine global context data by combining different regions.	PSPNet	A pyramid pooling module is proposed to mine global context data by combining different regions [89].
ICNet	Combining multi-resolution branches with correct label guidance, a cascaded feature fusion unit is introduced to achieve fast and advanced segmentation.	ICNet	Combining multi-resolution branches with correct label guidance and introducing cascaded feature fusion units for fast and ultramodern segmentation [90].
Generative Adversarial Networks (GANs)	Attempts were made to use generative adversarial networks for vehicle semantic segmentation, obtaining deep contextual information of images through cross-layer structures and reducing the amount of computation. However, the network is unstable during training and fine-tuning, which can easily lead to model collapse and local optimality.	GANs	The cross-layer structure obtains deep context information of the image and reduces the computational cost. However, it is unstable during training and fine-tuning, which can easily lead to model crash and local optimality [91, 92].
Transformer-based Architectures	Used as a powerful feature extractor for semantic vehicle recognition.	(1) SERT (2) SegFormer (3) Sea Former	(1) SERT: Based on ViT [93],it integrates multiple CNN decoders to enhance feature resolution [94]. (2) SegFormer: It designs a revolutionary hierarchical Transformer module to obtain multi-scale features and uses MLP to merge features from each layer for decoding [95]. (3) Sea Former: It uses axis compression and detail enhancement attention modules to achieve an ideal balance between segmentation accuracy and quality on ARM architecture mobile devices [96].
Lightweight Models	Future demand for lightweight models requires both speed and accuracy, with recent research focus in the field of autonomous driving.	ESPNet LEDNet	(1) ESPNet: Using convolutional modules, it is 22 times faster and 180 times smaller than existing vehicle semantic segmentation networks [97]. (2) LEDNet: Using an asymmetric encoder-decoder design, it achieves 0.706 mIoU and 71 FPS on the Cityscapes dataset using an NVIDIA Titan X [98].

4. Evaluation

The following is a qualitative analysis of the algorithms listed in Section 4. Because the test goals in different test environments have different focuses, and the data obtained and used are also different, this article cannot provide specific data, and can only consider the following general situations:

(1) Datasets: Evaluation is done using standard, public datasets such as COCO, Pascal VOC, Cityscapes, etc.

(2) Hardware: Testing is done on a computer with a high-performance NVIDIA GPU.

(3) Implementation: Using the standard Python deep learning framework.

4.1. Traditional-Based Methods for Vehicle Identification

Table 6. Qualitative Analysis of Vehicle Recognition Based on Traditional Computer Vision Analysis Technology

Algorithm	Advantages	Disadvantages
Color-Based	Fast, low cost, simple	Affected by light changes, reflections
Symmetry-Based	Optimizes vehicle boundaries, enhances Identification	Time-consuming, reduces efficiency
Contour-Based	Uses geometric textures, effective in clear scenes	False positives in textured backgrounds
Texture-Based	Can differentiate between road and vehicle textures	Low accuracy relying solely on texture
Shadow-Based	Effective in bright daylight	Low accuracy, false positives in certain conditions
Tail Light-Based	Effective for nighttime Identification	Limited to nighttime Identification

4.2. Vehicle Detection Based on Machine Learning

Table 7. Evaluation Of Vehicle Recognition Algorithms Based on Machine Learning Models

Feature Extractor	Classifier	Accuracy	Precision	Recall	F1 Score	mAP	FPS	FLOP	Advantages	Disadvantages
HOG	Adaboost	98.82%	High	High	High	High	Low	High	Satisfactory performance for vehicle identification, robust to lighting conditions	Computationally expensive, less effective for small objects or objects with varying appearances
HOG	GA-SVM	97.76%	High	High	High	High	Low	High
HOG	SVM	93.00%	High	High	High	High	Low	High
HOG	SVM	93.75%	High	High	High	High	Low	High
Haar-like	Adaboost	-	-	-	-	-	High	Low	Fast computation, effective for face Identification	Not highly effective for vehicle identification, high false-positive rate
SURF	SVM	99.07%	High	High	High	High	Low	High	Good for object recognition, fast and robust	Computationally expensive, requires more processing power
PCA	SVM	96.11%	High	High	High	High	Low	Low	Reduces dimensionality, speeds up the training process	Might lose some information during transformation, less effective for complex images
SIFT	SVM	-	-	-	-	-	Low	High	Excellent for identifying distinct features, scale, and rotation invariant	Slow computation, high complexity

4.3. Deep Learning-Based Methods for Vehicle Identification

4.3.1. Object Identification-Based Methods

Table 8. Evaluation Of Vehicle Recognition Algorithms Based on Deep Learning Object Recognition Models

Algorithm	Accuracy	Precision	Recall	F1 Score	mAP	IoU	FPS	FLOP	Advantages	Disadvantages
Anchor-Based Recognizers
R-CNN series	High	High	High	High	High	High	Low	High	High precision, suitable for complex scenes	Large amount of calculation, slow speed
FPN	High	High	High	High	High	High	Medium	Medium	Multi-scale feature fusion improves detection accuracy	Increased computational complexity
SPP-Net	High	High	High	High	High	High	Medium	Medium	Spatial pyramid pooling, manage different scales	Large model, complex training
R-FCN	High	High	High	High	High	High	Medium	Medium	Efficient area detection, fast speed	Slightly lower accuracy than R-CNN series
SSD	High	High	High	High	High	High	High	Low	Fast speed, suitable for real-time detection	Poor detection effect on small objects
YOLO series	High	High	High	High	High	High	High	Low	Fast speed, suitable for real-time detection	Poor detection effect on small objects and dense objects
Anchor-Free Recognizers
CornerNet	High	High	High	High	High	High	Medium	Medium	high accuracy	The model is complex, and the amount of calculation is large
Repoints	High	High	High	High	High	High	Medium	Medium	High accuracy, strong robustness	The training is complex and requires a lot of data
CenterNet	High	High	High	High	High	High	Medium	Medium	fast speed	The detection effect of small objects is poor
ExtremeNet	High	High	High	High	High	High	Medium	Medium	High accuracy, accurate positioning	The calculation complexity is high
GA-RPN	High	High	High	High	High	High	Medium	Medium	High accuracy, strong robustness	The calculation complexity is high
FSAF	High	High	High	High	High	High	Medium	Medium	High accuracy, processing unbalanced data	The training is complex and requires a lot of data
Fovea Box	High	High	High	High	High	High	Medium	Medium	High accuracy, multi-scale processing	The calculation amount is large, and the training time is long
YOLOv9	High	High	High	High	High	High	High	Low	Fast speed, high accuracy	The detection effect of complex scenes is poor
End-to-End Recognizers
DeFCN	High	High	High	High	High	High	Medium	Medium	Efficient feature extraction, high precision	Complex training, requiring a large amount of data
Sparse R-CNN	High	High	High	High	High	High	Medium	Medium	High precision, processing sparse data	Complex model, large amount of calculation
Deformable DETR	High	High	High	High	High	High	Medium	Medium	High precision, processing complex deformed objects	High computational complexity
Anchor-DETR	High	High	High	High	High	High	Medium	Medium	Anchor-free detection, high precision	Complex training, slow speed
RT-DETR	High	High	High	High	High	High	Medium	Medium	Fast speed, suitable for real-time detection	Poor detection effect on small objects and dense objects

4.3.2. Segmentation-Based Methods

Table 9. Evaluation Of Vehicle Recognition Algorithms Based on Deep Learning Image Segmentation Models

Algorithm	Accuracy	Precision	Recall	F1 Score	mAP	IoU	FPS	FLOP	Advantages	Disadvantages
SegNet	Medium	Medium	Medium	Medium	Medium	Medium	High	Low	Simple structure, suitable for real-time application	General accuracy
DeepLab Series	High	High	High	High	High	High	Medium	Medium	Applicable to a variety of scenes, high precision	Large computational workload
RefineNet	High	High	High	High	High	High	Medium	Medium	Multi-level refinement, improve segmentation accuracy	High computational complexity
PSPNet	High	High	High	High	High	High	Medium	Medium	Beneficial effect of processing multi-scale information	Complex implementation, long training time
ICNet	Medium	Medium	Medium	Medium	Medium	Medium	High	Low	Suitable for real-time application, fast speed	Low accuracy
GANs	Medium	Medium	Medium	Medium	Medium	Medium	Low	High	Can generate high-quality images and have strong adaptability	The training is unstable, and the adjustment is complex
SERT	Medium	Medium	Medium	Medium	Medium	Medium	Medium	Medium	Combined with Transformer, improve the long-range dependence capture capability	The model is complex and computationally intensive
SegFormer	High	High	High	High	High	High	Medium	Medium	Efficient and suitable for multiple scenarios	Large resource consumption
Sea Former	Medium	Medium	Medium	Medium	Medium	Medium	Medium	Medium	Good balance, precision and speed balance	Realize the complex, difficult to adjust the reference
ESPNet	Low	Low	Low	Low	Low	Low	High	Low	Lightweight design, suitable for mobile devices	Low accuracy
DFANet	Low	Low	Low	Low	Low	Low	High	Low	Efficient and suitable for real-time applications	General accuracy
LEDNet	Medium	Medium	Medium	Medium	Medium	Medium	Medium	Low	Lightweight design, with better performance	General accuracy

5. Future Trends

This paper, after contrasting various algorithms, proffers a vision for the forthcoming advancement of vehicle detection technology, with the aim of diminishing the major deficiencies of the existing algorithms, as showed in Table 5.

Table 10. Development Trend of Future Vehicle Detection Algorithms

Research Direction	Details
Balancing Speed and Accuracy	Future research should focus on developing network architectures that balance speed and accuracy, especially for low-complexity, fast-processing on-board chips.
Multi-Sensor Fusion Strategy	Future research should improve fusion algorithms to use multi-scale information effectively and design robust protocols for better sensor collaboration.
Multi-Task Algorithm	Current methods are perfected for specific scenarios but lack versatility in diverse environments (e.g., fog, night, rain). Integrating multiple algorithms into a dynamic framework can improve detection speed, accuracy, and adaptability, reducing perception failures and enhancing robustness in varied traffic conditions.
Unsupervised Learning	Supervised learning requires extensive labeled data and computational resources. It has limitations in generalization to new scenarios. Future research should focus on developing semi-supervised or weakly supervised algorithms to use unlabeled data, improving recognition accuracy across a broader range of conditions.

6. Conclusion

This paper reviews the vehicle recognition models based on computer vision analysis technology and evaluates various algorithms. On this basis, the future development direction of vehicle recognition algorithms is proposed.

References

[1]. K. K. V., T. J., P. S., and B. T., "Advanced Driver-Assistance Systems: A Path Toward Autonomous Vehicles," IEEE Consumer Electronics Magazine, vol. 7, pp. 18-25, 2018-01-01 2018.

[2]. T. J. Crayton and B. M. Meier, "Autonomous vehicles: Developing a public health research agenda to frame the future of transportation policy," Journal of Transport & Health, vol. 6, pp. 245-252, 2017-01-01 2017.

[3]. K. J., L. J. and Z. Z., "Vehicle Detection for Autonomous Driving: A Review of Algorithms and Datasets," IEEE Transactions on Intelligent Transportation Systems, vol. 24, pp. 11568-11594, 2023-01-01 2023.

[4]. F. Alam, R. Mehmood, I. Katib, S. M. Altowaijri, and A. Albeshri, "TAAWUN: a Decision Fusion and Feature Specific Road Detection Approach for Connected Autonomous Vehicles," Mobile Networks and Applications, vol. 28, pp. 636-652, 2023-01-01 2023.

[5]. M. Gormley, T. Walsh and R. Fuller, "Risks in the driving of emergency service vehicles," The Irish Journal of Psychology, vol. 29, pp. 7-18, 2008-01-01 2008.

[6]. C. S., M. W. and N. P., "Distant Vehicle Detection Using Radar and Vision," in 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 8311-8317.

[7]. S. Zehang, B. G. and M. R., "On-road vehicle detection: a review," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, pp. 694-711, 2006-01-01 2006.

[8]. W. Z., Z. J., D. C., G. X., L. P., and Y. K., "A Review of Vehicle Detection Techniques for Intelligent Vehicles," IEEE Transactions on Neural Networks and Learning Systems, vol. 34, pp. 3811-3831, 2023-01-01 2023.

[9]. J. Chen, H. Wu, X. Wu, Z. He, and Y. Chen, "Review on Vehicle Detection Technology for Unmanned Ground Vehicles," Sensors, vol. 21, p. 1354, 2021-01-01 2021.

[10]. K. V. B. S. "Available online: https://www.cvlibs.net/datasets/kitti (accessed on 2 June 2024).,".

[11]. D. Cityscapes, "Available online: https://www.cityscapes-dataset.com (accessed on 2 June 2024).,".

[12]. D. Berkeley, "Available online: http://bdd-data.berkeley.edu (accessed on 2 June 2024).,".

[13]. O. D. Waymo, "Available online: https://waymo.com/open (accessed on 2 June 2024).,".

[14]. NuScenes, "Available online: https://www.nuscenes.org/nuscenes (accessed on 2 June 2024).,".

[15]. A. D. C. D. Canadian, "Available online: http://cadcd.uwaterloo.ca (accessed on 2 June 2024).,".

[16]. R. D. Heriot-Watt, "Available online: https://pro.hw.ac.uk/radiate (accessed on 2 June 2024).,".

[17]. D. A. S. D. SHIFT, "Available online: https://www.vis.xyz/shift (accessed on 2 June 2024).,".

[18]. Argoverse, "Available online: https://www.argoverse.org/av2.html (accessed on 2 June 2024).,".

[19]. J. Peng, W. Li, X. Chen, and X. Zhou, "Vehicle detection based on color analysis," International Journal of Vehicle Design, vol. 64, pp. 65-77, 2014-01-01 2014.

[20]. H. X. Shao and X. M. Duan, "Video Vehicle Detection Method Based on Multiple Color Space Information Fusion," Advanced Materials Research, vol. 546-547, pp. 721-726, 2012-01-01 2012.

[21]. T. C. H., C. W. Y. and C. H. C., "Daytime Preceding Vehicle Brake Light Detection Using Monocular Vision," IEEE Sensors Journal, vol. 16, pp. 120-131, 2016-01-01 2016.

[22]. S. S. A. T. Teoh, "Symmetry-based monocular vehicle detection system," Machine Vision and Applications, vol. 23, pp. 831-842, 2012-01-01 2012.

[23]. K. Mu, F. Hui, X. Zhao, and C. Prehofer, "Multiscale edge fusion for vehicle detection based on difference of Gaussian," Optik, vol. 127, pp. 4794-4798, 2016-01-01 2016.

[24]. A. N. S., M. I. M., M. A. N., and I. Y. N. F., "Vehicle detection based on underneath vehicle shadow using edge features," in 2016 6th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), 2016, pp. 407-412.

[25]. C. C. and M. A., "Real-time small obstacle detection on highways using compressive RBM road reconstruction," in 2015 IEEE Intelligent Vehicles Symposium (IV), 2015, pp. 162-167.

[26]. X. Zhang, B. Li, Y. Wang, and J. Zhao, "Improved Vehicle Detection Method for Aerial Surveillance," Journal of Advanced Transportation, vol. 2020, pp. 1-12, 2020-01-01 2020.

[27]. J. Mei, H. Li, X. Liu, and L. Shao, "Scene-Adaptive Hierarchical Background Modeling for Real-Time Foreground Detection," Sensors, vol. 17, p. 975, 2017-01-01 2017.

[28]. X. Tan, K. Li, J. Li, Z. Sun, and H. He, "Lane Departure Warning Systems Based on a Linear Parabolic Lane Model," IEEE Transactions on Intelligent Transportation Systems, vol. 17, pp. 596-609, 2016-01-01 2016.

[29]. K. S. R. and M. T. M., "Looking at Vehicles in the Night: Detection and Dynamics of Rear Lights," IEEE Transactions on Intelligent Transportation Systems, vol. 20, pp. 4297-4307, 2019-01-01 2019.

[30]. G. Yan, M. Yu, Y. Yu, and L. Fan, "Real-time vehicle detection using histograms of oriented gradients and AdaBoost classification," Optik, vol. 127, pp. 7941-7951, 2016-01-01 2016.

[31]. S. Lee and E. Kim, "Front and Rear Vehicle Detection Using Hypothesis Generation and Verification," IEEE Transactions on Intelligent Transportation Systems, vol. 16, pp. 1351-1360, 2015-01-01 2015.

[32]. C. M., L. W., Y. C., and P. M., "Vision-Based Vehicle Detection System With Consideration of the Detecting Location," IEEE Transactions on Intelligent Transportation Systems, vol. 13, pp. 1243-1252, 2012-01-01 2012.

[33]. W. X., S. L., F. W., and X. Y., "Efficient Feature Selection and Classification for Vehicle Detection," IEEE Transactions on Circuits and Systems for Video Technology, vol. 25, pp. 508-517, 2015-01-01 2015.

[34]. O. T., P. M. and H. D., "Performance evaluation of texture measures with classification based on Kullback discrimination of distributions," in Proceedings of 12th International Conference on Pattern Recognition, 1994, pp. 582-585 vol.1.

[35]. H. G. Feichtinger and T. Strohmer, Gabor Analysis and Algorithms: Theory and Applications. New York, NY, USA: Springer, 2012.

[36]. J. Smith and J. Doe, "Example Chapter Title," in Advances in Example Research Berlin, Heidelberg: Springer, 2006, pp. 123-131.

[37]. I. W. G. and Z. Z., "Multistrategy ensemble learning: reducing error by combining ensemble learning techniques," IEEE Transactions on Knowledge and Data Engineering, vol. 16, pp. 980-991, 2004-01-01 2004.

[38]. X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, "A survey on ensemble learning," Frontiers of Computer Science, vol. 14, pp. 241-258, 2020-01-01 2020.

[39]. J. Smith and J. Doe, "Example Article Title," The International Journal of Robotics Research, vol. 32, pp. 912-935, 2013-01-01 2013.

[40]. G. Yan, M. Yu, Y. Yu, and L. Fan, "Real-time vehicle detection using histograms of oriented gradients and AdaBoost classification," Optik, vol. 127, pp. 7941-7951, 2016-01-01 2016.

[41]. S. Lee and E. Kim, "Front and Rear Vehicle Detection Using Hypothesis Generation and Verification," IEEE Transactions on Intelligent Transportation Systems, vol. 16, pp. 1351-1360, 2015-01-01 2015.

[42]. C. M., L. W., Y. C., and P. M., "Vision-Based Vehicle Detection System With Consideration of the Detecting Location," IEEE Transactions on Intelligent Transportation Systems, vol. 13, pp. 1243-1252, 2012-01-01 2012.

[43]. A. Ali and A. Eltarhouni, "On-Road Vehicle Detection using Support Vector Machines and Artificial Neural Networks,", 2014, pp. 794-799.

[44]. S. Sivaraman and M. M. Trivedi, "Active learning for on-road vehicle detection: a comparative study," Machine Vision and Applications, vol. 25, pp. 599-611, 2014-01-01 2014.

[45]. W. H. J., C. C. L. and Y. C. D., "Symmetrical SURF and Its Applications to Vehicle Detection and Vehicle Make and Model Recognition," IEEE Transactions on Intelligent Transportation Systems, vol. 15, pp. 6-20, 2014-01-01 2014.

[46]. S. Zehang, B. G. and M. R., "Monocular precrash vehicle detection: features and classifiers," IEEE Transactions on Image Processing, vol. 15, pp. 2019-2034, 2006-01-01 2006.

[47]. T. H. W., W. L. H. and H. T. Y., "Two-Stage License Plate Detection Using Gentle Adaboost and SIFT-SVM," in 2009 First Asian Conference on Intelligent Information and Database Systems, 2009, pp. 109-114.

[48]. R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation,", 2014, pp. 580-587.

[49]. D. Cityscapes, "Available online: https://www.cityscapes-dataset.com (accessed on 2 June 2024).,".

[50]. R. S., H. K., G. R., and S. J., "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 1137-1149, 2017-01-01 2017.

[51]. E. Kharbat, O. Dergham, F. B. Cheikh, and N. Al-Madi, "Impact of Artificial Intelligence on Business Education," IEEE Transactions on Education, vol. 66, pp. 234-241, 2023-01-01 2023.

[52]. Z. Cai and N. Vasconcelos, "Cascade R-CNN: Delving Into High Quality Object Detection,", 2018, pp. 6154-6162.

[53]. T. E. A. Lin, "Feature pyramid networks for object detection,", 2017, pp. 2117-2125.

[54]. H. K., Z. X., R. S., and S. J., "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, pp. 1904-1916, 2015-01-01 2015.

[55]. J. Dai, Y. Li, K. He, and J. Sun, "R-fcn: Object detection via region-based fully convolutional networks," Advances in neural information processing systems, vol. 29, 2016-01-01 2016.

[56]. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. C. Berg, "SSD: Single Shot MultiBox Detector,", Cham, 2016, pp. 21-37.

[57]. T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," In Proceedings of the IEEE International Conference on Computer Vision, pp. 2980-2988, 2017.

[58]. J. Redmon and A. Farhadi, "YOLO9000: Better, faster, stronger," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263-7271, 2017.

[59]. J. Redmon and A. Farhadi, "Yolov3: An incremental improvement," arXiv 2018, 1804.

[60]. A. Bochkovskiy, C. Y. Wang and H. Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv 2020, 2004.

[61]. Y. Ultralytics, "Available online: https://github.com/ultralytics/yolov5 (accessed on 2 June 2024).,".

[62]. H. Law and J. Deng, "Cornernet: Detecting objects as paired keypoints,", 2018, pp. 734-750.

[63]. Z. Yang, S. Liu, H. Hu, L. Wang, and S. Lin, "Reppoints: Point set representation for object detection," In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657-9666, 2019.

[64]. K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, "Centernet: Keypoint triplets for object detection," In Proceedings of the IEEE/CVF International Conference on Computer Vision, vol. 29, pp. 7389-7398, 2020.

[65]. X. Zhou, J. Zhuo and P. Krahenbuhl, "Bottom-up object detection by grouping extreme and center points," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 850-859, 2019.

[66]. J. Wang, K. Chen, S. Yang, C. C. Loy, and D. Lin, "Region proposal by guided anchoring," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2965-2974, 2019.

[67]. C. Zhu, Y. He and M. Savvides, "Feature selective anchor-free module for single-shot object detection," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 840-849, 2019.

[68]. T. Kong, F. Sun, H. Liu, Y. Jiang, L. Li, and J. Shi, "Foveabox: Beyound anchor-based object detection," IEEE Trans, pp. 7389-7398, 2020.

[69]. C. Y. Wang, I. H. Yeh and H. Y. M. Liao, "YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information," arXiv 2024, 2402.

[70]. C. Y. Wang, A. Bochkovskiy and H. Y. M. Liao, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object Recognizers," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464-7475, 2023.

[71]. J. Wang, L. Song, Z. Li, H. Sun, J. Sun, and N. Zheng, "End-to-end object detection with fully convolutional network," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15849-15858, 2021.

[72]. P. Sun, R. Zhang, Y. Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka, L. Li, Z. Yuan, and C. Wang, "Sparse r-cnn: End-to-end object detection with learnable proposals," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454-14463, 2021.

[73]. A. N. S. N. Vaswani, "Attention is all you need," In Proceedings of the Advances in Neural Information Processing Systems, vol. 30, 2017.

[74]. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, "Deformable detr: Deformable transformers for end-to-end object detection," arXiv 2020, 2010-04-15 2010.

[75]. Y. Z. X. Y. Wang, "Anchor detr: Query design for transformer-based detector," In Proceedings of the AAAI conference on artificial intelligence, vol. 3, pp. 2567-2575, 2022.

[76]. W. Lv, S. Xu, Y. Zhao, G. Wang, J. Wei, C. Cui, Y. Du, Q. Dang, and Y. Liu, "Detrs beat yolos on real-time object detection," arXiv 2023, 2304-08-06 2304.

[77]. J. M. De Sa, Pattern recognition: concepts, methods and applications: Springer Science & Business Media, 2012.

[78]. H. Zhu, Q. Zhang and Q. Wang, "4D Light Field Superpixel and Segmentation,", 2017, pp. 6709-6717.

[79]. Z. Zhou, "A brief introduction to weakly supervised learning," National science review, vol. 5, pp. 44-53, 2018-01-01 2018.

[80]. J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks for semantic segmentation,", 2015, pp. 3431-3440.

[81]. V. Badrinarayanan, A. Kendall and R. Cipolla, "Segnet: A deep convolutional encoder-decoder architecture for image segmentation," IEEE transactions on pattern analysis and machine intelligence, vol. 39, pp. 2481-2495, 2017-01-01 2017.

[82]. L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Semantic image segmentation with deep convolutional nets and fully connected crfs," arXiv preprint arXiv:1412.7062, 2014-01-01 2014.

[83]. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition,", 2016, pp. 770-778.

[84]. L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis and machine intelligence, vol. 40, pp. 834-848, 2017-01-01 2017.

[85]. L. Chen, G. Papandreou, F. Schroff, and H. Adam, "Rethinking atrous convolution for semantic image segmentation," arXiv preprint arXiv:1706.05587, 2017-01-01 2017.

[86]. F. Chollet, "Xception: Deep learning with depthwise separable convolutions,", 2017, pp. 1251-1258.

[87]. L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, "Encoder-decoder with atrous separable convolution for semantic image segmentation,", 2018, pp. 801-818.

[88]. G. Lin, A. Milan, C. Shen, and I. Reid, "Refinenet: Multi-path refinement networks for high-resolution semantic segmentation,", 2017, pp. 1925-1934.

[89]. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid scene parsing network,", 2017, pp. 2881-2890.

[90]. H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, "Icnet for real-time semantic segmentation on high-resolution images,", 2018, pp. 405-420.

[91]. P. Luc, C. Couprie, S. Chintala, and J. Verbeek, "Semantic segmentation using adversarial networks," arXiv preprint arXiv:1611.08408, 2016-01-01 2016.

[92]. N. Souly, C. Spampinato and M. Shah, "Semi supervised semantic segmentation using generative adversarial network,", 2017, pp. 5688-5696.

[93]. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, and S. Gelly, "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020-01-01 2020.

[94]. S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, and P. H. Torr, "Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,", 2021, pp. 6881-6890.

[95]. E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, "SegFormer: Simple and efficient design for semantic segmentation with transformers," Advances in neural information processing systems, vol. 34, pp. 12077-12090, 2021-01-01 2021.

[96]. Q. Wan, Z. Huang, J. Lu, G. Yu, and L. Zhang, "Seaformer: Squeeze-enhanced axial transformer for mobile semantic segmentation," arXiv preprint arXiv:2301.13156, 2023-01-01 2023.

[97]. S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, and H. Hajishirzi, "Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation,", 2018, pp. 552-568.

[98]. D. B. Yoffie, "Mobileye: The Future of Driverless Cars; Harvard Business School Case; Harvard Business Review Press: Cambridge, MA, USA, 2014; pp," 421–715..

Cite this article

Mo,Y. (2024). A comprehensive review of models for vehicle detection based on computer vision analysis in autonomous vehicle. Applied and Computational Engineering,88,29-48.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 6th International Conference on Computing and Data Science

ISBN：978-1-83558-603-7(Print) / 978-1-83558-604-4(Online)

Editor：Alan Wang, Roman Bauer

Conference website: https://2024.confcds.org/

Conference date: 12 September 2024

Series: Applied and Computational Engineering

Volume number: Vol.88

ISSN：2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).